Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of four samples of GEO accession GSE119855 with the IBU RNA-seq pipeline
Facebook
TwitterFor identifying the genes that are regulated by a transcription factor (TF), we have established an analytical pipeline that combines genomic systematic evolution of ligands by exponential enrichment (gSELEX)-Seq and RNA-Seq. Here, SELEX was used to select DNA fragments from an Aspergillus nidulans genomic library that bound specifically to AmyR, a TF from A. nidulans. High-throughput sequencing data were obtained for the DNAs enriched through the selection, following which various in silico analyses were performed. Mapping reads to the genome revealed the binding motifs including the canonical AmyR-binding motif, CGGN8CGG, as well as the candidate promoters controlled by AmyR. In parallel, differentially expressed genes related to AmyR were identified by using RNA-Seq analysis with samples from A. nidulans WT and amyR deletant. By obtaining the intersecting set of genes detected using both gSELEX-Seq and RNA-Seq, the genes directly regulated by AmyR in A. nidulans can be identified with high reliability. This analytical pipeline is a robust platform for comprehensive genome-wide identification of the genes that are regulated by a target TF.
Facebook
TwitterAdditional file 3: Table S1. Available configuration profiles. Configuration files exist under the SPEAQeasy/conf directory. Configuration profiles exist for SGE and SLURM clusters, as well as local execution on a Linux machine. These profiles can be customized for specific clusters, such as the JHPCE configuration file jhpce.config, which runs on an SGE cluster. The file a user chooses also depends on whether software dependencies are managed with docker, or are installed locally. Table S2. SPEAQeasy output files. Table of intermediary outputs generated by SPEAQeasy. These do not include the major output files of interest (Fig. 2), but other miscellaneous outputs from each processing step. In the Filename column, brackets denote one or more values dependent on a relevant variable; for example, the files [sample_name]_process_trace.log refer to a set of several files, each named distinctly according to the sample associated with the particular file. An asterisk represents a wildcard matching more than one file, when individual file names may depend on the experiment. For example, [sample_name]_trimmed*.fastq could refer to sample1_trimmed_1.fastq and sample1_trimmed_2.fastq. The next columns provide the directory containing each given file, relative to the output folder, and a description of the files’ content, respectively. Table S3. Quality metrics recorded in SPEAQeasy outputs. One of the major pipeline outputs is a comma-separated values (CSV) file where fields (columns) are different quality metrics, and each line (row) is associated with one sample. A list of the exact field names and their descriptions is given above. Table S4. SPEAQeasy-example differential expression and gene ontology results. (A) Differential expression results using the subset of BipSeq data analyzed in http://research.libd.org/SPEAQeasy-example/ . (B) Gene ontology enrichment results from the genes with a p-value < 0.005 in the differential expression results between bipolar disorder affected individuals and neurotypical controls. Table S5. Pipeline comparison. A comparison of usage-related features among several publicly available RNA-seq pipelines.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RNA-seq pipeline analysis of mouse single end libraries.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Background: mRNA interactions with each other and other signaling molecules define different biological pathways and functions. Researchers have been investigating various tools to analyze these types of interactions. In particular gene co-expression network methods have proved useful in finding and analyzing these molecular interactions. Many different analytical pipelines to identify these interactions networks have been proposed with the aim of identifying an optimal partition of the network where the individual modules are neither too small to make any general inference or too large to be biologically interpretable. Results: In this study we propose a new pipeline to perform gene co-expression network analysis. The proposed pipeline uses WGCNA a widely used software to perform different aspects of gene co-expression network analysis and modularity maximization algorithm to analyze novel RNA-Seq data to understand the effects of low-dose 56Fe ion irradiation on the formation of hepatocellular carcinoma in mice. The network results along with experimental validation show that using WGCNA combined with Modularity provide a more biologically interpretable network in our dataset. Our pipeline showed better performance than the existing clustering algorithm in WGCNA in finding modules and identified a module with mitochondrial subunits that are supported by mitochondrial complex assay. Conclusions: We present a pipeline that can reduce the problem of parameter selection with the existing algorithm in WGCNA for comparable RNA-Seq datasets which may assist in future research to discover novel mRNA interactions and their downstream molecular effects. C57BL16 males were placed into 2 treatment groups and received the following irradiation treatments at Brookhaven National Laboratories (Long Island NY): 600 MeV/n 56Fe (0.2 Gy) and no irradiation. Left liver lobes were collected at 30 60 120 270 and 360 days post-irradiation flash frozen and stored at -80 xc2 xb0C until they could be processed for RNA-Seq. Livers were sampled by taking two 40-micron thick slices using a cryotome at -20 xc2 xb0C. This allowed multiple sampling of the tissue without the tissue going through multiple freeze/thaw cycles. Total RNA was isolated from the liver slices using RNAqueousTM Total RNA Isolation Kit (ThermoFisher Scientific Waltham MA) and rRNA was removed via Ribo-ZeroTM rRNA Removal Kit (Illumina San Diego CA) prior to library preparation with the Illumina TruSeq RNA Library kit. Samples were sequenced in a paired-end 50 base format on an Illumina HiSeq 1500. Reads were aligned to the mouse GRCm38 reference genome using the STAR alignment program version 2.5.3a with the recommended ENCODE options. The -quantMode GeneCounts option was used to obtain read counts per gene based on the Gencode release M14 annotation file. Total number of reads used in analysis varies between 23-35 millions of reads.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single cell RNA-seq data generated and reported as part of the manuscript entitled "Dissecting the mechanisms underlying the Cytokine Release Syndrome (CRS) mediated by T Cell Bispecific Antibodies" by Leclercq-Cohen et al 2023. Raw and processed (filtered and annotated) data are provided as AnnData objects which can be directly ingested to reproduce the findings of the paper or for ab initio data reuse: 1- raw.zip provides concatenated raw/unfiltered counts for the 20 samples in the standard Market Exchange Format (MEX) format. 2- 230330_sw_besca2_LowFil_raw.h5ad contains filtered cells and raw counts in the HDF5 format. 3- 221124_sw_besca2_LowFil.annotated.h5ad contains filtered cells and log normalized counts, along with cell type annotation in the HDF5 format.
scRNAseq data generation: Whole blood from 4 donors was treated with 0.2 μg/mL CD20-TCB, or incubated in the absence of CD20- TCB. At baseline (before addition of TCB) and assay endpoints (2, 4, 6, and 20 hrs), blood was collected for total leukocyte isolation using EasySepTM red blood cell depletion reagent (Stemcell). Briefly, cells were counted and processed for single cell RNA sequencing using the BD Rhapsody platform. To load several samples on a single BD Rhapsody cartridge, sample cells were labelled with sample tags (BD Human Single-Cell Multiplexing Kit) following the manufacturer’s protocol prior to pooling. Briefly, 1x106 cells from each sample were re-suspended in 180 μL FBS Stain Buffer (BD, PharMingen) and sample tags were added to the respective samples and incubated for 20 min at RT. After incubation, 2 successive washes were performed by addition of 2 mL stain buffer and centrifugation for 5 min at 300 g. Cells were then re- suspended in 620 μL cold BD Sample Buffer, stained with 3.1 μL of both 2 mM Calcein AM (Thermo Fisher Scientific) and 0.3 mM Draq7 (BD Biosciences) and finally counted on the BD Rhapsody scanner. Samples were then diluted and/or pooled equally in 650 μL cold BD Sample Buffer. The BD Rhapsody cartridges were then loaded with up to 40 000 – 50 000 cells. Single cells were isolated using Single-Cell Capture and cDNA Synthesis with the BD Rhapsody Express Single-Cell Analysis System according to the manufacturer’s recommendations (BD Biosciences). cDNA libraries were prepared using the Whole Transcriptome Analysis Amplification Kit following the BD Rhapsody System mRNA Whole Transcriptome Analysis (WTA) and Sample Tag Library Preparation Protocol (BD Biosciences). Indexed WTA and sample tags libraries were quantified and quality controlled on the Qubit Fluorometer using the Qubit dsDNA HS Assay, and on the Agilent 2100 Bioanalyzer system using the Agilent High Sensitivity DNA Kit. Sequencing was performed on a Novaseq 6000 (Illumina) in paired-end mode (64-8- 58) with Novaseq6000 S2 v1 or Novaseq6000 SP v1.5 reagents kits (100 cycles). scRNAseq data analysis: Sequencing data was processed using the BD Rhapsody Analysis pipeline (v 1.0 https://www.bd.com/documents/guides/user-guides/GMX_BD-Rhapsody-genomics- informatics_UG_EN.pdf) on the Seven Bridges Genomics platform. Briefly, read pairs with low sequencing quality were first removed and the cell label and UMI identified for further quality check and filtering. Valid reads were then mapped to the human reference genome (GRCh38-PhiX-gencodev29) using the aligner Bowtie2 v2.2.9, and reads with the same cell label, same UMI sequence and same gene were collapsed into a single raw molecule while undergoing further error correction and quality checks. Cell labels were filtered with a multi-step algorithm to distinguish those associated with putative cells from those associated with noise. After determining the putative cells, each cell was assigned to the sample of origin through the sample tag (only for cartridges with multiplex loading). Finally, the single-cell gene expression matrices were generated and a metrics summary was provided. After pre-processing with BD’s pipeline, the count matrices and metadata of each sample were aggregated into a single adata object and loaded into the besca v2.3 pipeline for the single cell RNA sequencing analysis (43). First, we filtered low quality cells with less than 200 genes, less than 500 counts or more than 30% of mitochondrial reads. This permissive filtering was used in order to preserve the neutrophils. We further excluded potential multiplets (cells with more than 5,000 genes or 20,000 counts), and genes expressed in less than 30 cells. Normalization, log-transformed UMI counts per 10,000 reads [log(CP10K+1)], was applied before downstream analysis. After normalization, technical variance was removed by regressing out the effects of total UMI counts and percentage of mitochondrial reads, and gene expression was scaled. The 2,507 most variable genes (having a minimum mean expression of 0.0125, a maximum mean expression of 3 and a minimum dispersion of 0.5) were used for principal component analysis. Finally, the first 50 PCs were used as input for calculating the 10 nearest neighbours and the neighbourhood graph was then embedded into the two-dimensional space using the UMAP algorithm at a resolution of 2. Cell type annotation was performed using the Sig-annot semi-automated besca module, which is a signature- based hierarchical cell annotation method. The used signatures, configuration and nomenclature files can be found at https://github.com/bedapub/besca/tree/master/besca/datasets. For more details, please refer to the publication.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains single-cell RNA sequencing (scRNA-seq) data processed using the Scanpy pipeline.
It focuses on the GSE145926 dataset from publicly available sources.
The data has been ingested and stored in HDF5 format for easy access and manipulation.
It includes pre-processed expression matrices suitable for downstream analysis.
The dataset enables exploratory analysis using Plotly interactive visualizations.
It allows researchers to examine gene expression patterns at single-cell resolution.
Includes metadata annotations for cell types and experimental conditions.
Facilitates differential expression analysis and cell clustering investigations.
Supports visualization of key immune markers such as CD3E across cell populations.
Designed for bioinformaticians, computational biologists, and immunology researchers.
Provides an end-to-end demonstration of Scanpy workflow in Python.
Enables reproducibility and further expansion for custom analyses.
Facebook
TwitterA standardized pipeline for processing scRNA-seq data, integrating multiple tools for data wrangling, visualization, cell type identification, and analysis of changes in cellular activity.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This workflow adapts the approach and parameter settings of Trans-Omics for precision Medicine (TOPMed). The RNA-seq pipeline originated from the Broad Institute. There are in total five steps in the workflow starting from:
For testing and analysis, the workflow author provided example data created by down-sampling the read files of a TOPMed public access data. Chromosome 12 was extracted from the Homo Sapien Assembly 38 reference sequence and provided by the workflow authors. The required GTF and RSEM reference data files are also provided. The workflow is well-documented with a detailed set of instructions of the steps performed to down-sample the data are also provided for transparency. The availability of example input data, use of containerization for underlying software and detailed documentation are important factors in choosing this specific CWL workflow for CWLProv evaluation.
This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwl
Steps to reproduce
To build the research object again, use Python 3 on macOS. Built with:
Install cwltool
pip3 install cwltool==1.0.20180912090223
Install git lfs
The data download with the git repository requires the installation of Git lfs:
https://www.atlassian.com/git/tutorials/git-lfs#installing-git-lfs
Get the data and make the analysis environment ready:
git clone https://github.com/FarahZKhan/cwl_workflows.git
cd cwl_workflows/
git checkout CWLProvTesting
./topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/download_examples.sh
Run the following commands to create the CWLProv Research Object:
cwltool --provenance rnaseqwf_0.6.0_linux --tmp-outdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp --tmpdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp topmed-workflows/TOPMed_RNAseq_pipeline/rnaseq_pipeline_fastq.cwl topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/Dockstore.json
zip -r rnaseqwf_0.5.0_mac.zip rnaseqwf_0.5.0_mac
sha256sum rnaseqwf_0.5.0_mac.zip > rnaseqwf_0.5.0_mac_mac.zip.sha256
The https://github.com/FarahZKhan/cwl_workflows repository is a frozen snapshot from https://github.com/heliumdatacommons/TOPMed_RNAseq_CWL commit 027e8af41b906173aafdb791351fb29efc044120
Facebook
TwitterMassively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains processed single-cell RNA-sequencing (scRNA-seq) data from the PBMC 3K experiment.
It includes quality-control (QC) visualizations, cell-level metrics, clustering outputs, and exploratory analysis plots.
The dataset is designed to guide beginners and intermediate users through the essential steps of scRNA-seq preprocessing and analysis.
The PBMC 3K dataset represents human peripheral blood mononuclear cells sequenced using the 10x Genomics platform.
Included QC metrics help identify low-quality cells, doublets, stressed cells, and outliers based on standard thresholds.
The dataset covers filtering based on mitochondrial gene percentage, total UMIs, and number of detected genes.
All plots follow widely accepted scRNA-seq workflows commonly used in tools like Seurat, Scanpy, and SingleCellExperiment.
The QC violin plots illustrate distributions of nFeature_RNA, nCount_RNA, percent.mt, and other metrics used to assess cell quality.
The data also highlights the effect of filtering on overall dataset structure and variability.
Clustering-related files provide a visual understanding of how cells segregate into biologically meaningful groups.
Dimensionality-reduction plots also show patterns such as immune-cell diversity present in PBMC populations.
This dataset is suitable for hands-on learning, tutorial creation, classroom instruction, or benchmarking workflows.
It serves as a ready reference for researchers who wish to practice QC interpretation and cluster inspection.
The dataset allows quick reproduction of PBMC 3K quality-control visualizations without running the entire analysis pipeline.
It provides an accessible introduction to scRNA-seq analysis concepts for students, data scientists, and bioinformaticians.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Facebook
TwitterEven though high-throughput transcriptome sequencing is routinely performed in many laboratories, computational analysis of such data remains a cumbersome process often executed manually, hence error-prone and lacking reproducibility. For corresponding data processing, we introduce Curare, an easy-to-use yet versatile workflow builder for analyzing high-throughput RNA-Seq data focusing on differential gene expression experiments. Data analysis with Curare is customizable and subdivided into preprocessing, quality control, mapping, and downstream analysis stages, providing multiple options for each step while ensuring the reproducibility of the workflow. For a fast and straightforward exploration and visualization of differential gene expression results, we provide the gene expression visualizer software GenExVis. GenExVis can create various charts and tables from simple gene expression tables and DESeq2 results without the requirement to upload data or install software packages.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains RNA-Seq differential gene expression (DGE) analysis data.
It is derived from the Pasilla fruit fly dataset.
The data is processed using DESeq2, a widely-used tool for DGE analysis in R.
It includes gene counts, normalized counts, and statistical test results.
Users can explore differentially expressed genes between experimental conditions.
The dataset is suitable for transcriptomics, bioinformatics, and genomics research.
It can be used for benchmarking DGE analysis pipelines.
The dataset provides reproducible examples for learning DESeq2 workflows.
The source data is publicly available from the original Pasilla RNA-Seq study.
The dataset can be used to visualize and interpret RNA-Seq results in R.
It is ideal for researchers, students, and data scientists interested in genomics.
The dataset helps understand gene expression changes under experimental conditions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Metasheet Example (CSV 600 bytes)
Facebook
TwitterAdditional file 2: Figure S2: SPEAQeasy logs tracing computational steps by sample. To aid transparency and greatly simplify the source of execution errors, SPEAQeasy automatically generates logs with several pieces of information for every sample. In order of submission, the name of each Nextflow process is printed, along with (1) the working directory: where all relevant files are present, (2) the exit code: a standard indication of whether the process succeeded or how it failed, (3) a list of the specific commands run during the given process. Above is a snapshot of the top of an example log
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data corresponding to the scientific initiation project in bioinformatics.
The data in the parcial_report_input folder are equivalent to raw counts, metadata and data of FPKM values of genes from the analysis performed during the first six months of the project. They correspond the counts and metadata from a previous study from renal cell carcinoma. As for the Final_report_input, it also contains counts and metadata, but from a previous study of ostesarcoma. The metadata and raw data files can be found under the accesion number hs000699.v1.p1 in dbGAP. The scripts wrote to perform pre-processing of samples, differential expression analysis, network analysis and functional annotation can be found in GitHub repository.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Traveling to nearby extraterrestrial objects having a reduced gravity level (partial gravity) compared to Earth's gravity is becoming a realistic objective for space agencies. The use of plants as part of life support systems will require a better understanding of the interactions among plant growth responses including tropisms, under partial gravity conditions. Here, we present results from our latest space experiments on the ISS, in which seeds of Arabidopsis thaliana were germinated, and seedlings grew for six days under different gravity levels, namely micro-g, several intermediate partial-g levels, and 1g, and were subjected to irradiation with blue light for the last 48 hours. RNA was extracted from 20 samples for subsequent RNAseq analysis. Transcriptomic analysis was performed using the HISAT2-Stringtie-DESeq pipeline. Differentially expressed genes were further characterized for global responses using the GEDI tool, gene networks and for Gene Ontology (GO) enrichment.
Facebook
TwitterRNA sequencing (RNA-seq) is a reliable tool for detecting gene fusions in acute leukemia. Multiple bioinformatics pipelines have been developed to analyze RNA-seq data, but an agreed gold standard has not been established. This study aimed to compare the applicability of 5 fusion calling pipelines (Arriba, deFuse, CICERO, FusionCatcher, and STAR-Fusion), as well as to define and develop an integrative bioinformatics pipeline (Fusion InPipe) to detect clinically relevant gene fusions in acute pediatric leukemia. We analyzed RNA-seq data by each pipeline individually and by Fusion InPipe. Each algorithm individually called most of the fusions with similar sensitivity and precision. However, not all rearrangements were called, suggesting that choosing a single pipeline might cause missing important fusions. To improve this, we integrated the results of the five algorithms in just one pipeline, Fusion InPipe, comparing the output from the agreement of 5/5, 4/5, and 3/5 algorithms. The maximum sensitivity was achieved with the agreement of 3/5 algorithms, with a global sensitivity of 95%, achieving a 100% in patients’ data. Furthermore, we showed the necessity of filtering steps to reduce the false positive detection rate. Here, we demonstrate that Fusion InPipe is an excellent tool for fusion detection in pediatric acute leukemia with the best performance when selecting those fusions called by at least 3/5 pipelines.
Facebook
TwitterDescription of columns in the metadata file
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of four samples of GEO accession GSE119855 with the IBU RNA-seq pipeline