100+ datasets found
  1. Z

    Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

    • data.niaid.nih.gov
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hsu, Jonathan; Stoop, Allart (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10011621
    Explore at:
    Dataset updated
    Nov 20, 2023
    Authors
    Hsu, Jonathan; Stoop, Allart
    Description

    Table of Contents

    Main Description File Descriptions Linked Files Installation and Instructions

    1. Main Description

    This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data. The following libraries are required for script execution:

    Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap

    File Descriptions

    The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

    Linked Files

    This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

    Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

    Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

    Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

    Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

    Installation and Instructions

    The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

    Ensure you have R version 4.1.2 or higher for compatibility.

    Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

    1. Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).
    2. Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.
    3. Set your working directory to where the following files are located:

    marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt

    You can use the following code to set the working directory in R:

    setwd(directory)

    1. Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.
    2. Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.
    3. Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.
    4. Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
  2. o

    Repository for the single cell RNA sequencing data analysis for the human...

    • explore.openaire.eu
    Updated Aug 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan; Andrew; Pierre; Allart; Adrian (2023). Repository for the single cell RNA sequencing data analysis for the human manuscript. [Dataset]. http://doi.org/10.5281/zenodo.8286134
    Explore at:
    Dataset updated
    Aug 26, 2023
    Authors
    Jonathan; Andrew; Pierre; Allart; Adrian
    Description

    This is the GitHub repository for the single cell RNA sequencing data analysis for the human manuscript. The following essential libraries are required for script execution: Seurat scReportoire ggplot2 dplyr ggridges ggrepel ComplexHeatmap Linked File: -------------------------------------- This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. Provided below are descriptions of the linked datasets: 1. Gene Expression Omnibus (GEO) ID: GSE229626 - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the matrix.mtx, barcodes.tsv, and genes.tsv files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token"(https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). 2. Sequence read archive (SRA) repository - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the "raw sequencing" or .fastq.gz files, which are tab delimited text files. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token" (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). Please note that since the GSE submission is private, the raw data deposited at SRA may not be accessible until the embargo on GSE229626 has been lifted. Installation and Instructions -------------------------------------- The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation: > Ensure you have R version 4.1.2 or higher for compatibility. > Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code. The following code can be used to set working directory in R: > setwd(directory) Steps: 1. Download the "Human_code_April2023.R" and "Install_Packages.R" R scripts, and the processed data from GSE229626. 2. Open "R-Studios"(https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R. 3. Set your working directory to where the following files are located: - Human_code_April2023.R - Install_Packages.R 4. Open the file titled Install_Packages.R and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies. 5. Open the Human_code_April2023.R R script and execute commands as necessary.

  3. Gene Expression Cancer RNA-Seq

    • kaggle.com
    zip
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alban NYANTUDRE (2025). Gene Expression Cancer RNA-Seq [Dataset]. https://www.kaggle.com/datasets/waalbannyantudre/gene-expression-cancer-rna-seq-donated-on-682016
    Explore at:
    zip(73984306 bytes)Available download formats
    Dataset updated
    May 27, 2025
    Authors
    Alban NYANTUDRE
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This collection of data is part of the RNA-Seq (HiSeq) PANCAN dataset. It is a random extraction of gene expressions of patients having different types of tumor: BRCA, KIRC, COAD, LUAD, and PRAD. Each sample contains the expression of 20,531 genes for a patient diagnosed with one of the following cancers:

    CodeTumor Name
    BRCABreast invasive carcinoma (breast cancer)
    KIRCKidney renal clear cell carcinoma (kidney)
    COADColon adenocarcinoma (colon)
    LUADLung adenocarcinoma (lung)
    PRADProstate adenocarcinoma (prostate)

    Files:

    • data.csv: Gene expression matrix X (881 samples × 20,531 genes)
    • label.csv: True class label for each sample y (881 labels)

    Source: UCI ML Repository – Gene Expression Cancer RNA-Seq Data

  4. ESPRESSO: Robust discovery and quantification of transcript isoforms from...

    • zenodo.org
    application/gzip, tsv
    Updated Oct 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Wang; Robert Wang (2022). ESPRESSO: Robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data (repository for simulated ONT RNA-seq data) [Dataset]. http://doi.org/10.5281/zenodo.7246437
    Explore at:
    tsv, application/gzipAvailable download formats
    Dataset updated
    Oct 26, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Robert Wang; Robert Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Simulated ONT direct RNA and 1D cDNA sequencing data of varying sequencing depths (0.5 million, 1 million, 3 million, and 5 million simulated reads) used for benchmark evaluations of transcript discovery and quantification in our paper "ESPRESSO: Robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data". All details can be found in the Materials and Methods section of the paper.

    HEK293T_DirectRNA.transcriptome_quantification.tsv and HEK293T_DirectRNA.transcriptome_quantification.tsv are tab-separated files containing estimated raw read counts and normalized abundance values (in TPM) of transcripts annotated in GENCODE v34lift37. Transcript quantification was done using NanoSim (version 3.1.0).

    HEK293T_DirectRNA.NanoSim_500k.fastq.gz, HEK293T_DirectRNA.NanoSim_1M.fastq.gz, HEK293T_DirectRNA.NanoSim_3M.fastq.gz, and HEK293T_DirectRNA.NanoSim_5M.fastq.gz are gzip compressed FASTQ files containing 0.5 million, 1 million, 3 million, and 5 million simulated ONT direct RNA sequencing reads respectively.

    HEK293T_1DcDNA.NanoSim_500k.fastq.gz, HEK293T_1DcDNA.NanoSim_1M.fastq.gz, HEK293T_1DcDNA.NanoSim_3M.fastq.gz, and HEK293T_1DcDNA.NanoSim_5M.fastq.gz are gzip compressed FASTQ files containing 0.5 million, 1 million, 3 million, and 5 million simulated ONT 1D cDNA sequencing reads respectively.

  5. c

    University of San Diego Public Data Repository

    • datacommons.cyverse.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carl Procko, University of San Diego Public Data Repository [Dataset]. https://datacommons.cyverse.org/browse/iplant/home/shared/USD_teaching_materials
    Explore at:
    Dataset provided by
    CyVerse Data Commons
    Authors
    Carl Procko
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    This folder contains Illumina sequencing read files for teaching RNA-seq in the undergraduate classroom using CyVerse tools. Subfolders include data from shade- and ABA hormone-treated Arabidopsis plants. For complete descriptions of the data sets and experimental conditions, see Procko et al. Genes Dev. 2016 Jul 1;30(13):1529-41. doi: 10.1101/gad.283234.116. and Song et al. Science. 2016 Nov 4;354(6312). doi: 10.1126/science.aag1550. For the shade-treated data sets, only duplicates from the same batch are included.

  6. E

    RNA-Seq data from study: Molecular underpinnings of dedifferentiation and...

    • ega-archive.org
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). RNA-Seq data from study: Molecular underpinnings of dedifferentiation and aggressiveness in chromophobe renal cell carcinoma [Dataset]. https://ega-archive.org/datasets/EGAD50000000416
    Explore at:
    Dataset updated
    Sep 15, 2023
    License

    https://ega-archive.org/dacs/EGAC50000000203https://ega-archive.org/dacs/EGAC50000000203

    Description

    The ChRCC study RNA-Seq dataset contains raw whole transcriptome sequencing data of 16 tumor and 6 adjacent normal samples from 7 UTSW patients, who have consented to depositing their genomic data to public repository. RNA-Seq was performed using 50bp single-end on a HiSeq2500 platform (Illumina, San Diego, CA, USA). 50M reads per sample on average. The raw data is in fastq format.

  7. RNA-Seq-data

    • zenodo.org
    bin
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna Syme; Anna Syme (2020). RNA-Seq-data [Dataset]. http://doi.org/10.5281/zenodo.1311269
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anna Syme; Anna Syme
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RNA-seq data for analysing differential gene expression. Data from bacteria (E. coli) and subsampled to 1% or original data size. Six FASTQ files and two reference files (genome sequence and annotations).

  8. Field-wide assessment of differential HT-seq from NCBI GEO database

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jan 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp (2023). Field-wide assessment of differential HT-seq from NCBI GEO database [Dataset]. http://doi.org/10.5281/zenodo.7529832
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 13, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We analysed the field of expression profiling by high throughput sequencing, or HT-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository.

    - This release includes GEO series published up to Dec-31, 2020;

    geo-htseq.tar.gz archive contains following files:

    - output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).

    - output/document_summaries.csv, document summaries of NCBI GEO series.

    - output/suppfilenames.txt, list of all supplementary file names of NCBI GEO submissions.

    - output/suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO.

    - output/publications.csv, publication info of NCBI GEO series.

    - output/scopus_citedbycount.csv, Scopus citation info of NCBI GEO series

    - output/spots.csv, NCBI SRA sequencing run metadata.

    - output/cancer.csv, cancer related experiment accessions.

    - output/transcription_factor.csv, TF related experiment accessions.

    - output/single-cell.csv, single cell experiment accessions.

    - blacklist.txt, list of supplementary files that were either too large to import or were causing computing environment crash during import.

    Workflow to produce this dataset is available on Github at rstats-tartu/geo-htseq.

    geo-htseq-updates.tar.gz archive contains files:

    - results/detools_from_pmc.csv, differential expression analysis programs inferred from published articles

    - results/n_data.csv, manually curated sample size info for NCBI GEO HT-seq series

    - results/simres_df_parsed.csv, pi0 values estimated from differential expression results obtained from simulated RNA-seq data

    - results/data/parsed_suppfiles_rerun.csv, pi0 values estimated using smoother method from anti-conservative p-value sets

  9. f

    Supplemental data repository for Skelly et al (2020) Cell Stem Cell:...

    • datasetcatalog.nlm.nih.gov
    Updated Jul 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Baker, Christopher L.; Reinholdt, Laura G.; Skelly, Dan; pankratz, matthew; vincent, matthew; Munger, Steven C.; Gatti, Daniel M.; Churchill, Gary A.; Raghupathy, Narayanan; o'connor, callan; Keele, Gregory; Choi, Ted; Porter, Devin K.; Qin, Wenning; Czechanski, Anne M.; Byers, Candice; Spruce, Catrina; dion, stephanie; Harrill, Alison H.; martin, whitney; Stanton, Alexander R.; greenstein, ian; Choi, Kwangbom (2020). Supplemental data repository for Skelly et al (2020) Cell Stem Cell: "Mapping the effects of genetic variation on chromatin state and gene expression reveals loci that control ground state pluripotency". [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000545885
    Explore at:
    Dataset updated
    Jul 17, 2020
    Authors
    Baker, Christopher L.; Reinholdt, Laura G.; Skelly, Dan; pankratz, matthew; vincent, matthew; Munger, Steven C.; Gatti, Daniel M.; Churchill, Gary A.; Raghupathy, Narayanan; o'connor, callan; Keele, Gregory; Choi, Ted; Porter, Devin K.; Qin, Wenning; Czechanski, Anne M.; Byers, Candice; Spruce, Catrina; dion, stephanie; Harrill, Alison H.; martin, whitney; Stanton, Alexander R.; greenstein, ian; Choi, Kwangbom
    Description

    Genomics of Diversity Outbred mouse embryonic stem cellsNotes: - Any lines starting with # in these files are comments - Example code for analyses presented in the paper is available at https://github.com/daskelly/CellStemCell_2020_diverse_mESCsThe files in this directory are: * CCRIX_qPCR.tsv - data supporting top panel of Figure S3 * CCRIX_self_renewal.tsv - data supporting Figure 3E * Nr5a2_ChIP.txt - data supporting Figure 4H * counts_atac_norm_DO.tsv.gz - TMM-normalized counts for ATAC-Seq peaks called in Diversity Outbred samples * counts_rna_norm_DO.tsv.gz - upper quartile-normalized counts for RNA-Seq gene expression in Diversity Outbred samples * founder_nanog_flowcytometry.tsv.gz - data supporting Figure 1D * genotype_probs.Rds - genotype probabilities used for QTL mapping. Format is 3D array (dimensions are samples x founder haplotypes x pseudomarkers) * lifr_flowcytometry.tsv.gz - data supporting Figure S4C * luciferase_assay_results.txt - data supporting Figure 4C,I * quantitative_microscopy.tsv - data supporting Figure S1 * rna_seq_counts_allele_swap_ESCs.tsv - un-normalized estimated read counts derived from RNA-Seq data processed using EMASE as described in Methods * rna_seq_counts_founder_ESCs.tsv - un-normalized estimated read counts derived from RNA-Seq data processed using EMASE as described in Methods

  10. d

    RNA-seq analysis of mycobacteria stress response to microgravity

    • catalog.data.gov
    • datasets.ai
    • +3more
    Updated Aug 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open Science Data Repository (2025). RNA-seq analysis of mycobacteria stress response to microgravity [Dataset]. https://catalog.data.gov/dataset/rna-seq-analysis-of-mycobacteria-stress-response-to-microgravity-2ade4
    Explore at:
    Dataset updated
    Aug 30, 2025
    Dataset provided by
    Open Science Data Repository
    Description

    The aim of this work is to determine whether mycobacteria have enhanced virulence during space travel and what mechanisms they use to adapt to microgravity. M. marinum and LHM4 were grown in high aspect ratio vessels (HARV) in a rotary cell culture system (RCCS) under normal gravity (NG) or low shear simulated microgravity (MG). To determine the effect of MG on the stress responses activated by the growth conditions, we used RNAseq to examine what genes were expressed. For RNAseq, the bacteria are harvested, RNA isolated and converted DNA (cDNA), and the cDNA sequenced. Using bioinformatics, the amount of expression of the different M. marinum genes were compared between the NG and MG samples. To make sure that we were examining only gene expression changes due to MG, only bacteria in early exponential growth were used in the RNAseq studies. Triplicate NG and MG cultures were used to generate samples of bacteria grown for ~40 hrs. We also grew triplicate cultures for 4 days and then diluted them again and grew them for another ~40 hrs so we could examine gene expression from bacteria exposed for a longer time. In summary, this study determined that waterborne mycobacteria alter their growth, expression of stress responses, and their sensitivity to oxidizing conditions when subjected to growth under MG.

  11. E

    RNA-Seq profiles from the CheckMate-649 Clinical Trial

    • ega-archive.org
    Updated Feb 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). RNA-Seq profiles from the CheckMate-649 Clinical Trial [Dataset]. https://ega-archive.org/datasets/EGAD50000001105
    Explore at:
    Dataset updated
    Feb 23, 2021
    License

    https://ega-archive.org/dacs/EGAC00001003376https://ega-archive.org/dacs/EGAC00001003376

    Description

    This dataset contains RNA sequencing (RNAseq) data of 814 patients from the CheckMate 649 clinical trial whose ICF allows data deposition into a public repository. Gene expression profiling was performed retrospectively using RNAseq on a subset of baseline tumor samples. Paired-end FASTQ files were processed on Seven Bridges platform (Seven Bridges Genomics).

  12. CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object)

    • zenodo.org
    • explore.openaire.eu
    • +3more
    bin, zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farah Zaib Khan; Farah Zaib Khan; Stian Soiland-Reyes; Stian Soiland-Reyes (2020). CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object) [Dataset]. http://doi.org/10.17632/xnwncxpw42.1
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Farah Zaib Khan; Farah Zaib Khan; Stian Soiland-Reyes; Stian Soiland-Reyes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This workflow adapts the approach and parameter settings of Trans-Omics for precision Medicine (TOPMed). The RNA-seq pipeline originated from the Broad Institute. There are in total five steps in the workflow starting from:

    1. Read alignment using STAR which produces aligned BAM files including the Genome BAM and Transcriptome BAM.
    2. The Genome BAM file is processed using Picard MarkDuplicates. producing an updated BAM file containing information on duplicate reads (such reads can indicate biased interpretation).
    3. SAMtools index is then employed to generate an index for the BAM file, in preparation for the next step.
    4. The indexed BAM file is processed further with RNA-SeQC which takes the BAM file, human genome reference sequence and Gene Transfer Format (GTF) file as inputs to generate transcriptome-level expression quantifications and standard quality control metrics.
    5. In parallel with transcript quantification, isoform expression levels are quantified by RSEM. This step depends only on the output of the STAR tool, and additional RSEM reference sequences.

    For testing and analysis, the workflow author provided example data created by down-sampling the read files of a TOPMed public access data. Chromosome 12 was extracted from the Homo Sapien Assembly 38 reference sequence and provided by the workflow authors. The required GTF and RSEM reference data files are also provided. The workflow is well-documented with a detailed set of instructions of the steps performed to down-sample the data are also provided for transparency. The availability of example input data, use of containerization for underlying software and detailed documentation are important factors in choosing this specific CWL workflow for CWLProv evaluation.

    This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwl

    Steps to reproduce

    To build the research object again, use Python 3 on macOS. Built with:

    • Processor 2.8GHz Intel Core i7
    • Memory: 16GB
    • OS: macOS High Sierra, Version 10.13.3
    • Storage: 250GB
    1. Install cwltool

      pip3 install cwltool==1.0.20180912090223
    2. Install git lfs
      The data download with the git repository requires the installation of Git lfs:
      https://www.atlassian.com/git/tutorials/git-lfs#installing-git-lfs

    3. Get the data and make the analysis environment ready:

      git clone https://github.com/FarahZKhan/cwl_workflows.git
      cd cwl_workflows/
      git checkout CWLProvTesting
      ./topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/download_examples.sh
    4. Run the following commands to create the CWLProv Research Object:

      cwltool --provenance rnaseqwf_0.6.0_linux --tmp-outdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp --tmpdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp topmed-workflows/TOPMed_RNAseq_pipeline/rnaseq_pipeline_fastq.cwl topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/Dockstore.json
      
      zip -r rnaseqwf_0.5.0_mac.zip rnaseqwf_0.5.0_mac
      sha256sum rnaseqwf_0.5.0_mac.zip > rnaseqwf_0.5.0_mac_mac.zip.sha256

    The https://github.com/FarahZKhan/cwl_workflows repository is a frozen snapshot from https://github.com/heliumdatacommons/TOPMed_RNAseq_CWL commit 027e8af41b906173aafdb791351fb29efc044120

  13. L

    Random-primed mRNA-sequencing transcriptomic dataset for 70 primary human...

    • lincsportal.ccs.miami.edu
    • omicsdi.org
    tar.gz
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DToxS (Icahn School of Medicine at Mount Sinai), Random-primed mRNA-sequencing transcriptomic dataset for 70 primary human cardiomyocyte cell samples [Dataset]. https://lincsportal.ccs.miami.edu/datasets/view/LDS-1587
    Explore at:
    tar.gzAvailable download formats
    Dataset authored and provided by
    DToxS (Icahn School of Medicine at Mount Sinai)
    Measurement technique
    RNA-seq gene expression profiling assay
    Description

    Each of 70 cell samples either at the control condition or treated with FDA-approved cancer drugs is sequenced by the single-ended random-primed mRNA-sequencing method with a read length of 100 base pairs, and a total of 70 raw sequence data files in the FASTQ format are generated. These sequence data files are then analyzed by a high-performance computational pipeline and ranked lists of gene signatures and biological processes related to drug-induced cardiotoxicity are generated for each drug. The raw sequence datasets and the analysis results have been carefully controlled for data quality, and they are made publicly available at the Gene Expression Omnibus (GEO) database repository of NIH. As such, this broad drug-stimulated transcriptomi dataset is valuable for the prediction of drug toxicities and their mitigations.

  14. Data and metadata supporting the published article: Skeletal muscle...

    • springernature.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hannah E. Wilson; David A. Stanton; Cortney Montgomery; Aniello M. Infante; Matthew Tayor; Hannah Hazard-Jenkins; Elena N. Pugacheva; Emidio E Pistilli (2023). Data and metadata supporting the published article: Skeletal muscle reprogramming by breast cancer regardless of treatment history or tumor molecular subtype. [Dataset]. http://doi.org/10.6084/m9.figshare.12248951.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Hannah E. Wilson; David A. Stanton; Cortney Montgomery; Aniello M. Infante; Matthew Tayor; Hannah Hazard-Jenkins; Elena N. Pugacheva; Emidio E Pistilli
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Increased susceptibility to fatigue is a negative predictor of survival commonly experienced by women with breast cancer (BC). In this study, the authors sought to identify molecular changes induced in human skeletal muscle by BC regardless of treatment history or tumor molecular subtype using RNA-sequencing (RNA-seq) and proteomic analyses.Data access: The processed RNA-Seq and proteomics datasets generated during this study are publicly available in the figshare repository as part of this figshare data record: https://doi.org/10.6084/m9.figshare.12248951. The dataset ClinicalCharacteristics.xlsx is not publicly available in order to protect patient privacy, but will be made available on reasonable request from the corresponding author. The patients who took part in this study, did not give consent to have their genetic data made publicly available, and therefore the raw transcriptomic and proteomics data are not publicly available. Raw RNA-Seq and proteomics data will be made available on reasonable request from the corresponding author, to researchers who have completed a Data Usage Agreement. Corresponding author details: Dr. Emidio E. Pistilli, West Virginia University School of Medicine, email address: epistilli2@hsc.wvu.edu.Study approval and patient consent: The procedures in this study were reviewed and approved by the West Virginia University Institutional Review Board (IRB). Informed written consent was obtained from each subject or each subject’s guardian.Study aims and methodology: Muscle dysfunction in individuals with cancer is commonly thought to be a consequence of muscle atrophy, which is a major component of the paraneoplastic syndrome known as cancer cachexia. In this study, the authors tested the hypothesis that breast cancer induces a common molecular response in skeletal muscle that is independent of the molecular subtype of the tumor and the patient’s treatment history.A total of 71 female surgical patients provided informed consent for inclusion in this study (control n=20; BC n=51).Women with BC provided muscle biopsies from the pectoralis major muscle intraoperatively at the time of mastectomy, and control patients provided pectoralis major muscle samples intraoperatively during other breast surgeries. Women with BC were classified into four molecular subtypes based on immunohistochemical staining of their primary tumors:positive for estrogen receptor (ER) and progesterone receptor (PR)- ERPR (n=20), overexpression of HER2/neu in the absence of ER and PR expression- HER2 (n=9), triple negative —absence of ER, PR, and HER2/neu expression- TN (n=11), or triple positive—presence of ER and PR expression, and overexpression of HER2/neuTP-TP (n=11).Information on BMI at multiple time points was collected in 12 control and 50 BC patients. The following techniques are described in more detail in the published article: RNA sequencing, proteomics (including sample preparation, mass spectrometry, and mass spectrometry analysis), Western blotting, and patient muscle ATP quantification.Animal experiments were approved by the WVU Institutional Animal Care and Use Committee, and conducted in accordance with the Guidelines for Ethical Conduct in the Care and Use of Nonhuman Animals in Research. BC-PDOX mice were created by implanting human BC tumor fragments into themammary fat pad of female NOD.CG-Prkdscid Il2rgtm1 Wjl/SzJ/ 0557 (NSG) mice (n=6).For the in vitro experiments, the following cell lines were used: EpH4-EV (immortalized normal murine mammary epithelium), EO771 (murine luminal BC), NF639 (murine HER2/neu-overexpressing BC), HEK293 (human embryonic kidney), and C2C12 (murine myoblasts).Data supporting the figures and supplementary tables in the published article: The following datasets are included in this data record:3000pts.csv in .csv file formatAlbuminAndWeightLoss.csv in .csv file formatATPContentHuman.xlsx in .xlsx file formatATPContentPDOX.xlsx in .xlsx file formatATPProduction.xlsx in .xlsx file formatGFP.xlsx in .xlsx file formatRNASeqProteomicsCorrelation.xlsx in .xlsx file format, contains log-transformed gene and protein expression data for 8 patients with matched RNA-seq and proteomics dataSupplementary Data 3.xlsx in .xlsx file formatSupplementary Data1.xlsx in .xlsx file formatSupplementary Data2.xlsx in .xlsx file formatWBdata.xlsxDataset ClinicalCharacteristics.xlsx contains clinical information on study patients (i.e. body composition, race, treatment history, etc.) and will be made available on request.Figure/Supplementary table supported by the datasets listed above:Figure 1> SupplementaryData1.xlsxFigure 2> AlbuminAndWeightLoss.csv, 3000pts.csvFigure 3> SupplementaryData1.xlsxFigure 4> SupplementaryData1.xlsxFigure 5> SupplementaryData2.xlsx, WBdata.xlsx, SupplementaryData3.xlsxFigure 6> SupplementaryData1.xlsx, ATPContentHuman.xlsx, ATPContentPDOX, ATPProduction.xlsx,GFP.xlsxSupplementary table 1> SupplementaryData1.xlsxSupplementary table 2> SupplementaryData2.xlsxSupplementary table 3> SupplementaryData3.xlsx

  15. d

    Synthetic bulk RNA-Seq transcriptomic profiles representing 10 Cancer...

    • search.dataone.org
    • datadryad.org
    Updated Oct 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreyansh Priyadarshi; Camellia Mazumder; Sayan Biswas; Bhavesh Neekhra; Debayan Gupta; Shubhasis Haldar (2025). Synthetic bulk RNA-Seq transcriptomic profiles representing 10 Cancer hallmarks [Dataset]. http://doi.org/10.5061/dryad.zw3r228jc
    Explore at:
    Dataset updated
    Oct 23, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Shreyansh Priyadarshi; Camellia Mazumder; Sayan Biswas; Bhavesh Neekhra; Debayan Gupta; Shubhasis Haldar
    Description

    Evidence before this study  We conducted an extensive literature search using Google Scholar without language restrictions, employing search terms such as “(Predicting OR Classifying OR Annotating) and (cancer hallmarks) AND (Deep OR Machine Learning) OR (Artificial Intelligence OR AI).†Despite notable advances in molecular oncology and computational methodologies, a critical gap remains: no existing machine learning or deep learning framework comprehensively predicts cancer hallmarks from tumor biopsy samples. Current research primarily targets specific molecular pathways associated with individual hallmarks, leaving clinicians without an integrated model to interpret hallmark activity at the level of an individual tumor. Moreover, the absence of wet-lab techniques capable of annotating all cancer hallmarks in biopsy samples has further impeded progress, limiting the clinical utility of hallmark-related insights for precision oncology.  Added value of this study  This study introdu..., Dataset Collection and Processing  We utilized a large-scale dataset comprising 2.7 million single-cell transcriptomes derived from 14 tumor types, collected from 922 patients across 51 independent studies conducted globally. This dataset was sourced from the Weizmann Institute's 3CA repository. Quality Control  Before generating synthetic datasets for model training, the raw single-cell transcriptomic data underwent a rigorous quality control (QC) process. Cells with over 15% mitochondrial transcript content, fewer than 200, or more than 6,000 expressed mRNA transcripts were excluded to ensure data reliability.  Gene Set Curation  Gene sets representing cancer hallmarks were compiled from multiple databases, retaining only genes identified in at least two independent sources. This selection was refined through manual literature reviews to exclude genes without direct or indirect roles in hallmark-related pathways.  Digital Scoring  Using the curated gene sets, Digital Scores were..., , # Synthetic bulk RNA-Seq transcriptomic profiles representing 10 Cancer hallmarks

    https://doi.org/10.5061/dryad.zw3r228jc

    Description of the data and file structure

    Data Description: Experimental Efforts

    This dataset comprises single-cell transcriptomic data from the Weizmann 3CA repository, encompassing 2.7 million single-cell transcriptomes from 14 tumor types, collected from 922 patients across 51 global studies. The primary objective of the experimental efforts was to generate synthetic datasets for training and validating computational models to identify and analyze cancer hallmarks at the single-cell resolution.

    Single-cell RNA sequencing (scRNA-seq) data underwent a rigorous quality control process to ensure reliability and biological relevance. This included exclusion criteria based on mitochondrial transcript content (>15%) and mRNA transcript counts (<200 or >6,000 transcripts). Gene sets corresponding to 10 estab...

  16. h

    gtex-single-cell-rnaseq

    • huggingface.co
    Updated Nov 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lviv Polytechnic National University – Department of Artificial Intelligence Systems (2025). gtex-single-cell-rnaseq [Dataset]. https://huggingface.co/datasets/ai-department-lpnu/gtex-single-cell-rnaseq
    Explore at:
    Dataset updated
    Nov 22, 2025
    Dataset authored and provided by
    Lviv Polytechnic National University – Department of Artificial Intelligence Systems
    Description

    GTEx Single-Cell RNA-seq Dataset

    This repository provides tools to create a Hugging Face dataset from GTEx single-nucleus RNA-seq data, transforming the hierarchical H5AD format into a flat, ML-ready structure.

      Overview
    
    
    
    
    
    
    
      Data Source
    

    The data comes from GTEx's snRNA-seq atlas:

    Source: GTEx Portal Publication: Eraslan et al., Science 2022 - "Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function" Content: 209,126… See the full description on the dataset page: https://huggingface.co/datasets/ai-department-lpnu/gtex-single-cell-rnaseq.

  17. Z

    H.sapien Genelab OSD Normalized RNA Seq Matrix

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +1more
    Updated Dec 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Somsanith, June; Barker, Richard (2022). H.sapien Genelab OSD Normalized RNA Seq Matrix [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7443811
    Explore at:
    Dataset updated
    Dec 16, 2022
    Authors
    Somsanith, June; Barker, Richard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    H.sapien normalized counts RNA seq data matrix from NASA Genelab's open science data repository. Created using R.

  18. Coexpression networks of 31 GTEx and 256 SRA RNA-Seq datasets

    • zenodo.org
    txt, zip
    Updated Oct 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kayla Johnson; Kayla Johnson; Arjun Krishnan; Arjun Krishnan (2021). Coexpression networks of 31 GTEx and 256 SRA RNA-Seq datasets [Dataset]. http://doi.org/10.5281/zenodo.5510567
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Oct 11, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kayla Johnson; Kayla Johnson; Arjun Krishnan; Arjun Krishnan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data repository contains coexpression networks from publicly-available RNA-Seq datasets (obtained from the recount2 database) that were generated using the best workflows identified in the benchmarking study: Johnson KA, Krishnan A (2020) Robust normalization and transformation techniques for constructing gene coexpression networks from RNA-seq data. bioRxiv 10.1101/2020.09.22.308577.

    GTEx coexpression networks
    There are 62 coexpression networks built from 31 GTEx datasets (each dataset corresponding to one GTEx tissue) reconstructed using two different network-building workflows: i) CTF_CLR: Counts adjusted using TMM Factors followed by CLR transformation of the Pearson correlation coefficients; ii) CTF: Counts adjusted using TMM Factors (without any further transformation).

    SRA coexpression networks
    There are 256 coexpression networks built from 256 SRA datasets. Each dataset corresponds to a set of samples generated as part of the same transcriptome experiment from the same tissue. These networks are reconstructed using the top-performing workflow: CTF, Counts adjusted using TMM Factors.

    Refer to the preprint for more details on the workflows and the steps used for obtaining the original datasets.

  19. d

    small-RNA sequencing of sEV isolated from plasma of astronauts

    • catalog.data.gov
    • data.nasa.gov
    Updated Aug 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open Science Data Repository (2025). small-RNA sequencing of sEV isolated from plasma of astronauts [Dataset]. https://catalog.data.gov/dataset/small-rna-sequencing-of-sev-isolated-from-plasma-of-astronauts
    Explore at:
    Dataset updated
    Aug 30, 2025
    Dataset provided by
    Open Science Data Repository
    Description

    We sought to determine whether the spaceflight environment can induce alterations in small extracellular vesicles (sEV) smallRNA content and their utility as biomarkers. Using small RNA sequencing (sRNAseq), we evaluated the impact of the spaceflight environment on sEV miRNA content in peripheral blood (PB) plasma of 14 astronauts, who flew STS missions between 1998-2001. Samples were collected at three-time points:10 days before the launch (L-10), the day of return (R-0), and three days post-landing (R+3).

  20. Human Transcriptome (RNA): PacBio Isoform Data

    • kaggle.com
    zip
    Updated May 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A Merii (2024). Human Transcriptome (RNA): PacBio Isoform Data [Dataset]. https://www.kaggle.com/datasets/amerii/universal-human-rna-pacbio-isoform-data/code
    Explore at:
    zip(1019068667 bytes)Available download formats
    Dataset updated
    May 12, 2024
    Authors
    A Merii
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset Description

    Overview

    This dataset contains processed RNA sequencing data utilizing the Kinnex full-length RNA kit from Pacific Biosciences. It features data derived from the Universal Human Reference RNA (UHRR), a composite of RNAs from multiple human cell lines that represents a broad cross-section of the human transcriptome. Isoforms in this context refer to different versions of mRNA produced from the same gene by alternative splicing, which can result in diverse protein outputs. This dataset was meticulously prepared using Revio sequencing systems and underwent various stages of processing to ensure detailed, high-quality transcriptomic data.

    Data Source

    The data originates from the /public/dataset/Kinnex-full-length-RNA/DATA-Revio-UHRR/4-Collapse directory, part of a comprehensive RNA sequencing dataset collection hosted by Pacific Biosciences. The last modification was made on October 24, 2023.

    Files and Contents

    • collapse_isoforms.fasta: Contains sequences of collapsed RNA isoforms in FASTA format (3.5 GB).
    • collapse_isoforms.flnc_count.txt: Provides Full-Length Non-Concatemer (FLNC) read counts for each isoform (63 MB).
    • collapse_isoforms.gff: Offers annotations for the isoforms in Generic Feature Format (GFF) (861 MB).
    • collapse_isoforms.group.txt: Includes grouping information for the isoforms (75 MB).
    • collapse_isoforms.read_stat.txt: Contains detailed read statistics for each isoform (2.3 GB).
    • collapse_isoforms.report.json: Summarizes the collapse process in a JSON report (706 bytes).

    Potential Machine Learning Applications

    This dataset is suitable for a range of machine learning applications in computational biology and genomics, including: - Gene Expression Prediction: Training models to forecast gene expression levels from isoform data. - Alternative Splicing Detection: Developing algorithms to detect and classify alternative splicing events from isoform sequences. - Transcriptomic Data Imputation: Implementing models to complete missing transcriptomic data, enhancing data completeness. - Disease Association Studies: Using the dataset to identify transcript variants linked to specific diseases by integrating it with phenotypic or clinical data. - Isoform Function Prediction: Predicting functions of RNA isoforms based on their sequence and structural features.

    Usage Notes

    Researchers and data scientists are encouraged to cite this dataset in any publications or reports. The data should be used in accordance with Pacific Biosciences' terms of service and is not intended for diagnostic procedures.

    Citation

    Please reference the following in your work: - Kinnex full-length RNA kit, Pacific Biosciences of California, Inc. Data extracted from the Kinnex full-length RNA sequencing public dataset repository.

    Disclaimer

    This dataset is derived from Pacific Biosciences' technologies. The use of this dataset is intended for non-commercial, research, and educational purposes only. Redistribution or commercial use is not permitted without express consent from Pacific Biosciences. This dataset is provided under a custom license that aligns with Pacific Biosciences' usage terms and restrictions. For full usage rights and restrictions, please refer to the terms and conditions provided by Pacific Biosciences at PacBio Terms of Service.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hsu, Jonathan; Stoop, Allart (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10011621

Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

Explore at:
Dataset updated
Nov 20, 2023
Authors
Hsu, Jonathan; Stoop, Allart
Description

Table of Contents

Main Description File Descriptions Linked Files Installation and Instructions

1. Main Description

This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data. The following libraries are required for script execution:

Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap

File Descriptions

The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

Linked Files

This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

Installation and Instructions

The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

Ensure you have R version 4.1.2 or higher for compatibility.

Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

  1. Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).
  2. Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.
  3. Set your working directory to where the following files are located:

marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt

You can use the following code to set the working directory in R:

setwd(directory)

  1. Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.
  2. Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.
  3. Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.
  4. Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
Search
Clear search
Close search
Google apps
Main menu