Facebook
TwitterTable of Contents
Main Description File Descriptions Linked Files Installation and Instructions
This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data.
The following libraries are required for script execution:
Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap
The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.
This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:
Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).
Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files.
Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).
Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)
Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.
The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:
Ensure you have R version 4.1.2 or higher for compatibility.
Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.
marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt
You can use the following code to set the working directory in R:
setwd(directory)
Facebook
TwitterThis is the GitHub repository for the single cell RNA sequencing data analysis for the human manuscript. The following essential libraries are required for script execution: Seurat scReportoire ggplot2 dplyr ggridges ggrepel ComplexHeatmap Linked File: -------------------------------------- This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. Provided below are descriptions of the linked datasets: 1. Gene Expression Omnibus (GEO) ID: GSE229626 - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the matrix.mtx, barcodes.tsv, and genes.tsv files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token"(https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). 2. Sequence read archive (SRA) repository - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the "raw sequencing" or .fastq.gz files, which are tab delimited text files. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token" (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). Please note that since the GSE submission is private, the raw data deposited at SRA may not be accessible until the embargo on GSE229626 has been lifted. Installation and Instructions -------------------------------------- The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation: > Ensure you have R version 4.1.2 or higher for compatibility. > Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code. The following code can be used to set working directory in R: > setwd(directory) Steps: 1. Download the "Human_code_April2023.R" and "Install_Packages.R" R scripts, and the processed data from GSE229626. 2. Open "R-Studios"(https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R. 3. Set your working directory to where the following files are located: - Human_code_April2023.R - Install_Packages.R 4. Open the file titled Install_Packages.R and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies. 5. Open the Human_code_April2023.R R script and execute commands as necessary.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This collection of data is part of the RNA-Seq (HiSeq) PANCAN dataset. It is a random extraction of gene expressions of patients having different types of tumor: BRCA, KIRC, COAD, LUAD, and PRAD. Each sample contains the expression of 20,531 genes for a patient diagnosed with one of the following cancers:
| Code | Tumor Name |
|---|---|
| BRCA | Breast invasive carcinoma (breast cancer) |
| KIRC | Kidney renal clear cell carcinoma (kidney) |
| COAD | Colon adenocarcinoma (colon) |
| LUAD | Lung adenocarcinoma (lung) |
| PRAD | Prostate adenocarcinoma (prostate) |
Files:
data.csv: Gene expression matrix X (881 samples × 20,531 genes)label.csv: True class label for each sample y (881 labels)
Source: UCI ML Repository – Gene Expression Cancer RNA-Seq Data
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Simulated ONT direct RNA and 1D cDNA sequencing data of varying sequencing depths (0.5 million, 1 million, 3 million, and 5 million simulated reads) used for benchmark evaluations of transcript discovery and quantification in our paper "ESPRESSO: Robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data". All details can be found in the Materials and Methods section of the paper.
HEK293T_DirectRNA.transcriptome_quantification.tsv and HEK293T_DirectRNA.transcriptome_quantification.tsv are tab-separated files containing estimated raw read counts and normalized abundance values (in TPM) of transcripts annotated in GENCODE v34lift37. Transcript quantification was done using NanoSim (version 3.1.0).
HEK293T_DirectRNA.NanoSim_500k.fastq.gz, HEK293T_DirectRNA.NanoSim_1M.fastq.gz, HEK293T_DirectRNA.NanoSim_3M.fastq.gz, and HEK293T_DirectRNA.NanoSim_5M.fastq.gz are gzip compressed FASTQ files containing 0.5 million, 1 million, 3 million, and 5 million simulated ONT direct RNA sequencing reads respectively.
HEK293T_1DcDNA.NanoSim_500k.fastq.gz, HEK293T_1DcDNA.NanoSim_1M.fastq.gz, HEK293T_1DcDNA.NanoSim_3M.fastq.gz, and HEK293T_1DcDNA.NanoSim_5M.fastq.gz are gzip compressed FASTQ files containing 0.5 million, 1 million, 3 million, and 5 million simulated ONT 1D cDNA sequencing reads respectively.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
This folder contains Illumina sequencing read files for teaching RNA-seq in the undergraduate classroom using CyVerse tools. Subfolders include data from shade- and ABA hormone-treated Arabidopsis plants. For complete descriptions of the data sets and experimental conditions, see Procko et al. Genes Dev. 2016 Jul 1;30(13):1529-41. doi: 10.1101/gad.283234.116. and Song et al. Science. 2016 Nov 4;354(6312). doi: 10.1126/science.aag1550. For the shade-treated data sets, only duplicates from the same batch are included.
Facebook
Twitterhttps://ega-archive.org/dacs/EGAC50000000203https://ega-archive.org/dacs/EGAC50000000203
The ChRCC study RNA-Seq dataset contains raw whole transcriptome sequencing data of 16 tumor and 6 adjacent normal samples from 7 UTSW patients, who have consented to depositing their genomic data to public repository. RNA-Seq was performed using 50bp single-end on a HiSeq2500 platform (Illumina, San Diego, CA, USA). 50M reads per sample on average. The raw data is in fastq format.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RNA-seq data for analysing differential gene expression. Data from bacteria (E. coli) and subsampled to 1% or original data size. Six FASTQ files and two reference files (genome sequence and annotations).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We analysed the field of expression profiling by high throughput sequencing, or HT-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository.
- This release includes GEO series published up to Dec-31, 2020;
geo-htseq.tar.gz archive contains following files:
- output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).
- output/document_summaries.csv, document summaries of NCBI GEO series.
- output/suppfilenames.txt, list of all supplementary file names of NCBI GEO submissions.
- output/suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO.
- output/publications.csv, publication info of NCBI GEO series.
- output/scopus_citedbycount.csv, Scopus citation info of NCBI GEO series
- output/spots.csv, NCBI SRA sequencing run metadata.
- output/cancer.csv, cancer related experiment accessions.
- output/transcription_factor.csv, TF related experiment accessions.
- output/single-cell.csv, single cell experiment accessions.
- blacklist.txt, list of supplementary files that were either too large to import or were causing computing environment crash during import.
Workflow to produce this dataset is available on Github at rstats-tartu/geo-htseq.
geo-htseq-updates.tar.gz archive contains files:
- results/detools_from_pmc.csv, differential expression analysis programs inferred from published articles
- results/n_data.csv, manually curated sample size info for NCBI GEO HT-seq series
- results/simres_df_parsed.csv, pi0 values estimated from differential expression results obtained from simulated RNA-seq data
- results/data/parsed_suppfiles_rerun.csv, pi0 values estimated using smoother method from anti-conservative p-value sets
Facebook
Twitter# in these files are comments - Example code for analyses presented in the paper is available at https://github.com/daskelly/CellStemCell_2020_diverse_mESCsThe files in this directory are: * CCRIX_qPCR.tsv - data supporting top panel of Figure S3 * CCRIX_self_renewal.tsv - data supporting Figure 3E * Nr5a2_ChIP.txt - data supporting Figure 4H * counts_atac_norm_DO.tsv.gz - TMM-normalized counts for ATAC-Seq peaks called in Diversity Outbred samples * counts_rna_norm_DO.tsv.gz - upper quartile-normalized counts for RNA-Seq gene expression in Diversity Outbred samples * founder_nanog_flowcytometry.tsv.gz - data supporting Figure 1D * genotype_probs.Rds - genotype probabilities used for QTL mapping. Format is 3D array (dimensions are samples x founder haplotypes x pseudomarkers) * lifr_flowcytometry.tsv.gz - data supporting Figure S4C * luciferase_assay_results.txt - data supporting Figure 4C,I * quantitative_microscopy.tsv - data supporting Figure S1 * rna_seq_counts_allele_swap_ESCs.tsv - un-normalized estimated read counts derived from RNA-Seq data processed using EMASE as described in Methods * rna_seq_counts_founder_ESCs.tsv - un-normalized estimated read counts derived from RNA-Seq data processed using EMASE as described in Methods
Facebook
TwitterThe aim of this work is to determine whether mycobacteria have enhanced virulence during space travel and what mechanisms they use to adapt to microgravity. M. marinum and LHM4 were grown in high aspect ratio vessels (HARV) in a rotary cell culture system (RCCS) under normal gravity (NG) or low shear simulated microgravity (MG). To determine the effect of MG on the stress responses activated by the growth conditions, we used RNAseq to examine what genes were expressed. For RNAseq, the bacteria are harvested, RNA isolated and converted DNA (cDNA), and the cDNA sequenced. Using bioinformatics, the amount of expression of the different M. marinum genes were compared between the NG and MG samples. To make sure that we were examining only gene expression changes due to MG, only bacteria in early exponential growth were used in the RNAseq studies. Triplicate NG and MG cultures were used to generate samples of bacteria grown for ~40 hrs. We also grew triplicate cultures for 4 days and then diluted them again and grew them for another ~40 hrs so we could examine gene expression from bacteria exposed for a longer time. In summary, this study determined that waterborne mycobacteria alter their growth, expression of stress responses, and their sensitivity to oxidizing conditions when subjected to growth under MG.
Facebook
Twitterhttps://ega-archive.org/dacs/EGAC00001003376https://ega-archive.org/dacs/EGAC00001003376
This dataset contains RNA sequencing (RNAseq) data of 814 patients from the CheckMate 649 clinical trial whose ICF allows data deposition into a public repository. Gene expression profiling was performed retrospectively using RNAseq on a subset of baseline tumor samples. Paired-end FASTQ files were processed on Seven Bridges platform (Seven Bridges Genomics).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This workflow adapts the approach and parameter settings of Trans-Omics for precision Medicine (TOPMed). The RNA-seq pipeline originated from the Broad Institute. There are in total five steps in the workflow starting from:
For testing and analysis, the workflow author provided example data created by down-sampling the read files of a TOPMed public access data. Chromosome 12 was extracted from the Homo Sapien Assembly 38 reference sequence and provided by the workflow authors. The required GTF and RSEM reference data files are also provided. The workflow is well-documented with a detailed set of instructions of the steps performed to down-sample the data are also provided for transparency. The availability of example input data, use of containerization for underlying software and detailed documentation are important factors in choosing this specific CWL workflow for CWLProv evaluation.
This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwl
Steps to reproduce
To build the research object again, use Python 3 on macOS. Built with:
Install cwltool
pip3 install cwltool==1.0.20180912090223
Install git lfs
The data download with the git repository requires the installation of Git lfs:
https://www.atlassian.com/git/tutorials/git-lfs#installing-git-lfs
Get the data and make the analysis environment ready:
git clone https://github.com/FarahZKhan/cwl_workflows.git
cd cwl_workflows/
git checkout CWLProvTesting
./topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/download_examples.sh
Run the following commands to create the CWLProv Research Object:
cwltool --provenance rnaseqwf_0.6.0_linux --tmp-outdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp --tmpdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp topmed-workflows/TOPMed_RNAseq_pipeline/rnaseq_pipeline_fastq.cwl topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/Dockstore.json
zip -r rnaseqwf_0.5.0_mac.zip rnaseqwf_0.5.0_mac
sha256sum rnaseqwf_0.5.0_mac.zip > rnaseqwf_0.5.0_mac_mac.zip.sha256
The https://github.com/FarahZKhan/cwl_workflows repository is a frozen snapshot from https://github.com/heliumdatacommons/TOPMed_RNAseq_CWL commit 027e8af41b906173aafdb791351fb29efc044120
Facebook
TwitterEach of 70 cell samples either at the control condition or treated with FDA-approved cancer drugs is sequenced by the single-ended random-primed mRNA-sequencing method with a read length of 100 base pairs, and a total of 70 raw sequence data files in the FASTQ format are generated. These sequence data files are then analyzed by a high-performance computational pipeline and ranked lists of gene signatures and biological processes related to drug-induced cardiotoxicity are generated for each drug. The raw sequence datasets and the analysis results have been carefully controlled for data quality, and they are made publicly available at the Gene Expression Omnibus (GEO) database repository of NIH. As such, this broad drug-stimulated transcriptomi dataset is valuable for the prediction of drug toxicities and their mitigations.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Increased susceptibility to fatigue is a negative predictor of survival commonly experienced by women with breast cancer (BC). In this study, the authors sought to identify molecular changes induced in human skeletal muscle by BC regardless of treatment history or tumor molecular subtype using RNA-sequencing (RNA-seq) and proteomic analyses.Data access: The processed RNA-Seq and proteomics datasets generated during this study are publicly available in the figshare repository as part of this figshare data record: https://doi.org/10.6084/m9.figshare.12248951. The dataset ClinicalCharacteristics.xlsx is not publicly available in order to protect patient privacy, but will be made available on reasonable request from the corresponding author. The patients who took part in this study, did not give consent to have their genetic data made publicly available, and therefore the raw transcriptomic and proteomics data are not publicly available. Raw RNA-Seq and proteomics data will be made available on reasonable request from the corresponding author, to researchers who have completed a Data Usage Agreement. Corresponding author details: Dr. Emidio E. Pistilli, West Virginia University School of Medicine, email address: epistilli2@hsc.wvu.edu.Study approval and patient consent: The procedures in this study were reviewed and approved by the West Virginia University Institutional Review Board (IRB). Informed written consent was obtained from each subject or each subject’s guardian.Study aims and methodology: Muscle dysfunction in individuals with cancer is commonly thought to be a consequence of muscle atrophy, which is a major component of the paraneoplastic syndrome known as cancer cachexia. In this study, the authors tested the hypothesis that breast cancer induces a common molecular response in skeletal muscle that is independent of the molecular subtype of the tumor and the patient’s treatment history.A total of 71 female surgical patients provided informed consent for inclusion in this study (control n=20; BC n=51).Women with BC provided muscle biopsies from the pectoralis major muscle intraoperatively at the time of mastectomy, and control patients provided pectoralis major muscle samples intraoperatively during other breast surgeries. Women with BC were classified into four molecular subtypes based on immunohistochemical staining of their primary tumors:positive for estrogen receptor (ER) and progesterone receptor (PR)- ERPR (n=20), overexpression of HER2/neu in the absence of ER and PR expression- HER2 (n=9), triple negative —absence of ER, PR, and HER2/neu expression- TN (n=11), or triple positive—presence of ER and PR expression, and overexpression of HER2/neuTP-TP (n=11).Information on BMI at multiple time points was collected in 12 control and 50 BC patients. The following techniques are described in more detail in the published article: RNA sequencing, proteomics (including sample preparation, mass spectrometry, and mass spectrometry analysis), Western blotting, and patient muscle ATP quantification.Animal experiments were approved by the WVU Institutional Animal Care and Use Committee, and conducted in accordance with the Guidelines for Ethical Conduct in the Care and Use of Nonhuman Animals in Research. BC-PDOX mice were created by implanting human BC tumor fragments into themammary fat pad of female NOD.CG-Prkdscid Il2rgtm1 Wjl/SzJ/ 0557 (NSG) mice (n=6).For the in vitro experiments, the following cell lines were used: EpH4-EV (immortalized normal murine mammary epithelium), EO771 (murine luminal BC), NF639 (murine HER2/neu-overexpressing BC), HEK293 (human embryonic kidney), and C2C12 (murine myoblasts).Data supporting the figures and supplementary tables in the published article: The following datasets are included in this data record:3000pts.csv in .csv file formatAlbuminAndWeightLoss.csv in .csv file formatATPContentHuman.xlsx in .xlsx file formatATPContentPDOX.xlsx in .xlsx file formatATPProduction.xlsx in .xlsx file formatGFP.xlsx in .xlsx file formatRNASeqProteomicsCorrelation.xlsx in .xlsx file format, contains log-transformed gene and protein expression data for 8 patients with matched RNA-seq and proteomics dataSupplementary Data 3.xlsx in .xlsx file formatSupplementary Data1.xlsx in .xlsx file formatSupplementary Data2.xlsx in .xlsx file formatWBdata.xlsxDataset ClinicalCharacteristics.xlsx contains clinical information on study patients (i.e. body composition, race, treatment history, etc.) and will be made available on request.Figure/Supplementary table supported by the datasets listed above:Figure 1> SupplementaryData1.xlsxFigure 2> AlbuminAndWeightLoss.csv, 3000pts.csvFigure 3> SupplementaryData1.xlsxFigure 4> SupplementaryData1.xlsxFigure 5> SupplementaryData2.xlsx, WBdata.xlsx, SupplementaryData3.xlsxFigure 6> SupplementaryData1.xlsx, ATPContentHuman.xlsx, ATPContentPDOX, ATPProduction.xlsx,GFP.xlsxSupplementary table 1> SupplementaryData1.xlsxSupplementary table 2> SupplementaryData2.xlsxSupplementary table 3> SupplementaryData3.xlsx
Facebook
TwitterEvidence before this study  We conducted an extensive literature search using Google Scholar without language restrictions, employing search terms such as “(Predicting OR Classifying OR Annotating) and (cancer hallmarks) AND (Deep OR Machine Learning) OR (Artificial Intelligence OR AI).†Despite notable advances in molecular oncology and computational methodologies, a critical gap remains: no existing machine learning or deep learning framework comprehensively predicts cancer hallmarks from tumor biopsy samples. Current research primarily targets specific molecular pathways associated with individual hallmarks, leaving clinicians without an integrated model to interpret hallmark activity at the level of an individual tumor. Moreover, the absence of wet-lab techniques capable of annotating all cancer hallmarks in biopsy samples has further impeded progress, limiting the clinical utility of hallmark-related insights for precision oncology.  Added value of this study  This study introdu..., Dataset Collection and Processing  We utilized a large-scale dataset comprising 2.7 million single-cell transcriptomes derived from 14 tumor types, collected from 922 patients across 51 independent studies conducted globally. This dataset was sourced from the Weizmann Institute's 3CA repository. Quality Control  Before generating synthetic datasets for model training, the raw single-cell transcriptomic data underwent a rigorous quality control (QC) process. Cells with over 15% mitochondrial transcript content, fewer than 200, or more than 6,000 expressed mRNA transcripts were excluded to ensure data reliability.  Gene Set Curation  Gene sets representing cancer hallmarks were compiled from multiple databases, retaining only genes identified in at least two independent sources. This selection was refined through manual literature reviews to exclude genes without direct or indirect roles in hallmark-related pathways.  Digital Scoring  Using the curated gene sets, Digital Scores were..., , # Synthetic bulk RNA-Seq transcriptomic profiles representing 10 Cancer hallmarks
https://doi.org/10.5061/dryad.zw3r228jc
This dataset comprises single-cell transcriptomic data from the Weizmann 3CA repository, encompassing 2.7 million single-cell transcriptomes from 14 tumor types, collected from 922 patients across 51 global studies. The primary objective of the experimental efforts was to generate synthetic datasets for training and validating computational models to identify and analyze cancer hallmarks at the single-cell resolution.
Single-cell RNA sequencing (scRNA-seq) data underwent a rigorous quality control process to ensure reliability and biological relevance. This included exclusion criteria based on mitochondrial transcript content (>15%) and mRNA transcript counts (<200 or >6,000 transcripts). Gene sets corresponding to 10 estab...
Facebook
TwitterGTEx Single-Cell RNA-seq Dataset
This repository provides tools to create a Hugging Face dataset from GTEx single-nucleus RNA-seq data, transforming the hierarchical H5AD format into a flat, ML-ready structure.
Overview
Data Source
The data comes from GTEx's snRNA-seq atlas:
Source: GTEx Portal Publication: Eraslan et al., Science 2022 - "Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function" Content: 209,126… See the full description on the dataset page: https://huggingface.co/datasets/ai-department-lpnu/gtex-single-cell-rnaseq.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
H.sapien normalized counts RNA seq data matrix from NASA Genelab's open science data repository. Created using R.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data repository contains coexpression networks from publicly-available RNA-Seq datasets (obtained from the recount2 database) that were generated using the best workflows identified in the benchmarking study: Johnson KA, Krishnan A (2020) Robust normalization and transformation techniques for constructing gene coexpression networks from RNA-seq data. bioRxiv 10.1101/2020.09.22.308577.
GTEx coexpression networks
There are 62 coexpression networks built from 31 GTEx datasets (each dataset corresponding to one GTEx tissue) reconstructed using two different network-building workflows: i) CTF_CLR: Counts adjusted using TMM Factors followed by CLR transformation of the Pearson correlation coefficients; ii) CTF: Counts adjusted using TMM Factors (without any further transformation).
SRA coexpression networks
There are 256 coexpression networks built from 256 SRA datasets. Each dataset corresponds to a set of samples generated as part of the same transcriptome experiment from the same tissue. These networks are reconstructed using the top-performing workflow: CTF, Counts adjusted using TMM Factors.
Refer to the preprint for more details on the workflows and the steps used for obtaining the original datasets.
Facebook
TwitterWe sought to determine whether the spaceflight environment can induce alterations in small extracellular vesicles (sEV) smallRNA content and their utility as biomarkers. Using small RNA sequencing (sRNAseq), we evaluated the impact of the spaceflight environment on sEV miRNA content in peripheral blood (PB) plasma of 14 astronauts, who flew STS missions between 1998-2001. Samples were collected at three-time points:10 days before the launch (L-10), the day of return (R-0), and three days post-landing (R+3).
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contains processed RNA sequencing data utilizing the Kinnex full-length RNA kit from Pacific Biosciences. It features data derived from the Universal Human Reference RNA (UHRR), a composite of RNAs from multiple human cell lines that represents a broad cross-section of the human transcriptome. Isoforms in this context refer to different versions of mRNA produced from the same gene by alternative splicing, which can result in diverse protein outputs. This dataset was meticulously prepared using Revio sequencing systems and underwent various stages of processing to ensure detailed, high-quality transcriptomic data.
The data originates from the /public/dataset/Kinnex-full-length-RNA/DATA-Revio-UHRR/4-Collapse directory, part of a comprehensive RNA sequencing dataset collection hosted by Pacific Biosciences. The last modification was made on October 24, 2023.
This dataset is suitable for a range of machine learning applications in computational biology and genomics, including: - Gene Expression Prediction: Training models to forecast gene expression levels from isoform data. - Alternative Splicing Detection: Developing algorithms to detect and classify alternative splicing events from isoform sequences. - Transcriptomic Data Imputation: Implementing models to complete missing transcriptomic data, enhancing data completeness. - Disease Association Studies: Using the dataset to identify transcript variants linked to specific diseases by integrating it with phenotypic or clinical data. - Isoform Function Prediction: Predicting functions of RNA isoforms based on their sequence and structural features.
Researchers and data scientists are encouraged to cite this dataset in any publications or reports. The data should be used in accordance with Pacific Biosciences' terms of service and is not intended for diagnostic procedures.
Please reference the following in your work: - Kinnex full-length RNA kit, Pacific Biosciences of California, Inc. Data extracted from the Kinnex full-length RNA sequencing public dataset repository.
This dataset is derived from Pacific Biosciences' technologies. The use of this dataset is intended for non-commercial, research, and educational purposes only. Redistribution or commercial use is not permitted without express consent from Pacific Biosciences. This dataset is provided under a custom license that aligns with Pacific Biosciences' usage terms and restrictions. For full usage rights and restrictions, please refer to the terms and conditions provided by Pacific Biosciences at PacBio Terms of Service.
Facebook
TwitterTable of Contents
Main Description File Descriptions Linked Files Installation and Instructions
This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data.
The following libraries are required for script execution:
Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap
The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.
This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:
Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).
Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files.
Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).
Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)
Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.
The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:
Ensure you have R version 4.1.2 or higher for compatibility.
Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.
marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt
You can use the following code to set the working directory in R:
setwd(directory)