100+ datasets found
  1. f

    DataSheet1_ML-GAP: machine learning-enhanced genomic analysis pipeline using...

    • frontiersin.figshare.com
    • figshare.com
    pdf
    Updated Sep 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melih Agraz; Dincer Goksuluk; Peng Zhang; Bum-Rak Choi; Richard T. Clements; Gaurav Choudhary; George Em Karniadakis (2024). DataSheet1_ML-GAP: machine learning-enhanced genomic analysis pipeline using autoencoders and data augmentation.PDF [Dataset]. http://doi.org/10.3389/fgene.2024.1442759.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Sep 27, 2024
    Dataset provided by
    Frontiers
    Authors
    Melih Agraz; Dincer Goksuluk; Peng Zhang; Bum-Rak Choi; Richard T. Clements; Gaurav Choudhary; George Em Karniadakis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionThe advent of RNA sequencing (RNA-Seq) has significantly advanced our understanding of the transcriptomic landscape, revealing intricate gene expression patterns across biological states and conditions. However, the complexity and volume of RNA-Seq data pose challenges in identifying differentially expressed genes (DEGs), critical for understanding the molecular basis of diseases like cancer.MethodsWe introduce a novel Machine Learning-Enhanced Genomic Data Analysis Pipeline (ML-GAP) that incorporates autoencoders and innovative data augmentation strategies, notably the MixUp method, to overcome these challenges. By creating synthetic training examples through a linear combination of input pairs and their labels, MixUp significantly enhances the model’s ability to generalize from the training data to unseen examples.ResultsOur results demonstrate the ML-GAP’s superiority in accuracy, efficiency, and insights, particularly crediting the MixUp method for its substantial contribution to the pipeline’s effectiveness, advancing greatly genomic data analysis and setting a new standard in the field.DiscussionThis, in turn, suggests that ML-GAP has the potential to perform more accurate detection of DEGs but also offers new avenues for therapeutic intervention and research. By integrating explainable artificial intelligence (XAI) techniques, ML-GAP ensures a transparent and interpretable analysis, highlighting the significance of identified genetic markers.

  2. Z

    Supporting data for "Software pipelines for RNA-Seq, ChIP-Seq and Germline...

    • data.niaid.nih.gov
    Updated Sep 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fotis Psomopoulos (2023). Supporting data for "Software pipelines for RNA-Seq, ChIP-Seq and Germline Variant calling analyses in Common Workflow Language (CWL)" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8116555
    Explore at:
    Dataset updated
    Sep 27, 2023
    Dataset provided by
    Nikolaos Pechlivanis
    Konstantinos Kyritsis
    Fotis Psomopoulos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets produced during the validation of CWL-based pipelines, designed for the analysis of data from RNA-Seq, ChIP-Seq and germline variant calling experiments. Specifically, the workflows were tested using publicly available High-throughput (HTS) data from published studies on Chronic Lymphocytic Leukemia (CLL) (accession numbers: E-MTAB-6962, GSE115772) and Genome in a Bottle (GIAB) project samples (accession numbers: SRR6794144, SRR22476789, SRR22476790, SRR22476791).

    The supporting data include:

    Differential transcript and gene expression results produced during the analysis with the CWL-based RNA-Seq pipeline

    Bigwig and narrowPeak files, differential binding results, table of consensus peaks and read counts of EZH2 and H3K27me3, produced during the analysis with the CWL-based ChIP-Seq pipeline

    VCF files containing the detected and filtered variants, along with the respective hap.py () results regarding comparisons against the GIAB golden standard truth sets for both CWL-based germline variant calling pipelines

  3. Z

    Example RNA-seq analysis of data from GSE119855

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geert van Geest (2023). Example RNA-seq analysis of data from GSE119855 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7691546
    Explore at:
    Dataset updated
    Mar 10, 2023
    Dataset authored and provided by
    Geert van Geest
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of four samples of GEO accession GSE119855 with the IBU RNA-seq pipeline

  4. Data from: Efficient Identification of Multiple Pathways: RNA-Seq Analysis...

    • catalog.data.gov
    • data.nasa.gov
    • +1more
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2025). Efficient Identification of Multiple Pathways: RNA-Seq Analysis of Livers from 56Fe Ion Irradiated Mice [Dataset]. https://catalog.data.gov/dataset/efficient-identification-of-multiple-pathways-rna-seq-analysis-of-livers-from-56fe-ion-irr-3f51e
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Background: mRNA interactions with each other and other signaling molecules define different biological pathways and functions. Researchers have been investigating various tools to analyze these types of interactions. In particular gene co-expression network methods have proved useful in finding and analyzing these molecular interactions. Many different analytical pipelines to identify these interactions networks have been proposed with the aim of identifying an optimal partition of the network where the individual modules are neither too small to make any general inference or too large to be biologically interpretable. Results: In this study we propose a new pipeline to perform gene co-expression network analysis. The proposed pipeline uses WGCNA a widely used software to perform different aspects of gene co-expression network analysis and modularity maximization algorithm to analyze novel RNA-Seq data to understand the effects of low-dose 56Fe ion irradiation on the formation of hepatocellular carcinoma in mice. The network results along with experimental validation show that using WGCNA combined with Modularity provide a more biologically interpretable network in our dataset. Our pipeline showed better performance than the existing clustering algorithm in WGCNA in finding modules and identified a module with mitochondrial subunits that are supported by mitochondrial complex assay. Conclusions: We present a pipeline that can reduce the problem of parameter selection with the existing algorithm in WGCNA for comparable RNA-Seq datasets which may assist in future research to discover novel mRNA interactions and their downstream molecular effects. C57BL16 males were placed into 2 treatment groups and received the following irradiation treatments at Brookhaven National Laboratories (Long Island NY): 600 MeV/n 56Fe (0.2 Gy) and no irradiation. Left liver lobes were collected at 30 60 120 270 and 360 days post-irradiation flash frozen and stored at -80 xc2 xb0C until they could be processed for RNA-Seq. Livers were sampled by taking two 40-micron thick slices using a cryotome at -20 xc2 xb0C. This allowed multiple sampling of the tissue without the tissue going through multiple freeze/thaw cycles. Total RNA was isolated from the liver slices using RNAqueousTM Total RNA Isolation Kit (ThermoFisher Scientific Waltham MA) and rRNA was removed via Ribo-ZeroTM rRNA Removal Kit (Illumina San Diego CA) prior to library preparation with the Illumina TruSeq RNA Library kit. Samples were sequenced in a paired-end 50 base format on an Illumina HiSeq 1500. Reads were aligned to the mouse GRCm38 reference genome using the STAR alignment program version 2.5.3a with the recommended ENCODE options. The -quantMode GeneCounts option was used to obtain read counts per gene based on the Gencode release M14 annotation file. Total number of reads used in analysis varies between 23-35 millions of reads.

  5. Ngs-Based Rna-Seq Market Analysis North America, Europe, Asia, Rest of World...

    • technavio.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio, Ngs-Based Rna-Seq Market Analysis North America, Europe, Asia, Rest of World (ROW) - US, UK, Germany, Singapore, China - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/ngs-based-rna-seq-market-analysis
    Explore at:
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    United States, Global
    Description

    Snapshot img

    NGS-Based Rna-Seq Market Size 2024-2028

    The NGS-based RNA-seq market size is forecast to increase by USD 6.66 billion, at a CAGR of 20.52% between 2023 and 2028.

    The market is witnessing significant growth, driven by the increased adoption of next-generation sequencing (NGS) methods for RNA-Seq analysis. The advanced capabilities of NGS techniques, such as high-throughput, cost-effectiveness, and improved accuracy, have made them the preferred choice for researchers and clinicians in various fields, including genomics, transcriptomics, and personalized medicine. However, the market faces challenges, primarily from the lack of clinical validation on direct-to-consumer genetic tests. As the use of NGS technology in consumer applications expands, ensuring the accuracy and reliability of results becomes crucial.
    The absence of standardized protocols and regulatory oversight in this area poses a significant challenge to market growth and trust. Companies seeking to capitalize on market opportunities must focus on addressing these challenges through collaborations, partnerships, and investments in research and development to ensure the clinical validity and reliability of their NGS-based RNA-Seq offerings.
    

    What will be the Size of the NGS-based RNA-Seq market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
    Request Free Sample

    The market continues to evolve, driven by advancements in NGS technology and its applications across various sectors. Spatial transcriptomics, a novel approach to studying gene expression in its spatial context, is gaining traction in disease research and precision medicine. Splice junction detection, a critical component of RNA-seq data analysis, enhances the accuracy of gene expression profiling and differential gene expression studies. Cloud computing plays a pivotal role in handling the massive amounts of data generated by NGS platforms, enabling real-time data analysis and storage. Enrichment analysis, gene ontology, and pathway analysis facilitate the interpretation of RNA-seq data, while data normalization and quality control ensure the reliability of results.

    Precision medicine and personalized therapy are key applications of RNA-seq, with single-cell RNA-seq offering unprecedented insights into the complexities of gene expression at the single-cell level. Read alignment and variant calling are essential steps in RNA-seq data analysis, while bioinformatics pipelines and RNA-seq software streamline the process. NGS technology is revolutionizing drug discovery by enabling the identification of biomarkers and gene fusion detection in various diseases, including cancer and neurological disorders. RNA-seq is also finding applications in infectious diseases, microbiome analysis, environmental monitoring, agricultural genomics, and forensic science. Sequencing costs are decreasing, making RNA-seq more accessible to researchers and clinicians.

    The ongoing development of sequencing platforms, library preparation, and sample preparation kits continues to drive innovation in the field. The dynamic nature of the market ensures that it remains a vibrant and evolving field, with ongoing research and development in areas such as data visualization, clinical trials, and sequencing depth.

    How is this NGS-based RNA-Seq industry segmented?

    The NGS-based RNA-seq industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    End-user
    
      Acamedic and research centers
      Clinical research
      Pharma companies
      Hospitals
    
    
    Technology
    
      Sequencing by synthesis
      Ion semiconductor sequencing
      Single-molecule real-time sequencing
      Others
    
    
    Geography
    
      North America
    
        US
    
    
      Europe
    
        Germany
        UK
    
    
      APAC
    
        China
        Singapore
    
    
      Rest of World (ROW)
    

    .

    By End-user Insights

    The acamedic and research centers segment is estimated to witness significant growth during the forecast period.

    The global next-generation sequencing (NGS) market for RNA sequencing (RNA-Seq) is primarily driven by academic and research institutions, including those from universities, research institutes, government entities, biotechnology organizations, and pharmaceutical companies. These institutions utilize NGS technology for various research applications, such as whole-genome sequencing, epigenetics, and emerging fields like agrigenomics and animal research, to enhance crop yield and nutritional composition. NGS-based RNA-Seq plays a pivotal role in translational research, with significant investments from both private and public organizations fueling its growth. The technology is instrumental in disease research, enabling the identification

  6. f

    Additional file 1: of VIPER: Visualization Pipeline for RNA-seq, a Snakemake...

    • springernature.figshare.com
    txt
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MacIntosh Cornwell; Mahesh Vangala; Len Taing; Zachary Herbert; Johannes KĂśster; Bo Li; Hanfei Sun; Taiwen Li; Jian Zhang; Xintao Qiu; Matthew Pun; Rinath Jeselsohn; Myles Brown; X. Liu; Henry Long (2023). Additional file 1: of VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6138272.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Authors
    MacIntosh Cornwell; Mahesh Vangala; Len Taing; Zachary Herbert; Johannes KĂśster; Bo Li; Hanfei Sun; Taiwen Li; Jian Zhang; Xintao Qiu; Matthew Pun; Rinath Jeselsohn; Myles Brown; X. Liu; Henry Long
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Config Example (YAML 6Â kb)

  7. CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object)

    • zenodo.org
    • data.niaid.nih.gov
    • +2more
    bin, zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farah Zaib Khan; Farah Zaib Khan; Stian Soiland-Reyes; Stian Soiland-Reyes (2020). CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object) [Dataset]. http://doi.org/10.17632/xnwncxpw42.1
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Farah Zaib Khan; Farah Zaib Khan; Stian Soiland-Reyes; Stian Soiland-Reyes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This workflow adapts the approach and parameter settings of Trans-Omics for precision Medicine (TOPMed). The RNA-seq pipeline originated from the Broad Institute. There are in total five steps in the workflow starting from:

    1. Read alignment using STAR which produces aligned BAM files including the Genome BAM and Transcriptome BAM.
    2. The Genome BAM file is processed using Picard MarkDuplicates. producing an updated BAM file containing information on duplicate reads (such reads can indicate biased interpretation).
    3. SAMtools index is then employed to generate an index for the BAM file, in preparation for the next step.
    4. The indexed BAM file is processed further with RNA-SeQC which takes the BAM file, human genome reference sequence and Gene Transfer Format (GTF) file as inputs to generate transcriptome-level expression quantifications and standard quality control metrics.
    5. In parallel with transcript quantification, isoform expression levels are quantified by RSEM. This step depends only on the output of the STAR tool, and additional RSEM reference sequences.

    For testing and analysis, the workflow author provided example data created by down-sampling the read files of a TOPMed public access data. Chromosome 12 was extracted from the Homo Sapien Assembly 38 reference sequence and provided by the workflow authors. The required GTF and RSEM reference data files are also provided. The workflow is well-documented with a detailed set of instructions of the steps performed to down-sample the data are also provided for transparency. The availability of example input data, use of containerization for underlying software and detailed documentation are important factors in choosing this specific CWL workflow for CWLProv evaluation.

    This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwl

    Steps to reproduce

    To build the research object again, use Python 3 on macOS. Built with:

    • Processor 2.8GHz Intel Core i7
    • Memory: 16GB
    • OS: macOS High Sierra, Version 10.13.3
    • Storage: 250GB
    1. Install cwltool

      pip3 install cwltool==1.0.20180912090223
    2. Install git lfs
      The data download with the git repository requires the installation of Git lfs:
      https://www.atlassian.com/git/tutorials/git-lfs#installing-git-lfs

    3. Get the data and make the analysis environment ready:

      git clone https://github.com/FarahZKhan/cwl_workflows.git
      cd cwl_workflows/
      git checkout CWLProvTesting
      ./topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/download_examples.sh
    4. Run the following commands to create the CWLProv Research Object:

      cwltool --provenance rnaseqwf_0.6.0_linux --tmp-outdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp --tmpdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp topmed-workflows/TOPMed_RNAseq_pipeline/rnaseq_pipeline_fastq.cwl topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/Dockstore.json
      
      zip -r rnaseqwf_0.5.0_mac.zip rnaseqwf_0.5.0_mac
      sha256sum rnaseqwf_0.5.0_mac.zip > rnaseqwf_0.5.0_mac_mac.zip.sha256

    The https://github.com/FarahZKhan/cwl_workflows repository is a frozen snapshot from https://github.com/heliumdatacommons/TOPMed_RNAseq_CWL commit 027e8af41b906173aafdb791351fb29efc044120

  8. o

    Data from: Comparative Analysis of RNA-Seq Alignment Algorithms and the...

    • omicsdi.org
    xml
    Updated Aug 3, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Pierce,Gregory R Grant,Michael H Farkas,Eric A Pierce (2011). Comparative Analysis of RNA-Seq Alignment Algorithms and the RNA-Seq Unified Mapper (RUM). [Dataset]. https://www.omicsdi.org/dataset/arrayexpress-repository/E-GEOD-26248
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Aug 3, 2011
    Authors
    Eric Pierce,Gregory R Grant,Michael H Farkas,Eric A Pierce
    Variables measured
    Transcriptomics,Multiomics
    Description

    A critical task in high throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data is discrete in nature; therefore with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not previously been performed. RESULTS: We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors, and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used RT-PCR and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM) performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability. RNA-Seq of mouse retinal RNA, as described.

  9. S

    New pipeline for mRNA-Seq and ATAC-Seq analysis allows for biological...

    • data.scilifelab.se
    Updated Aug 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). New pipeline for mRNA-Seq and ATAC-Seq analysis allows for biological insights without in-depth bioinformatics skills [Dataset]. https://data.scilifelab.se/highlights/cactus/
    Explore at:
    Dataset updated
    Aug 19, 2024
    Description

    Salignon et al. created Cactus, a new pipeline that can be used for comprehensive ATAC-Seq and mRNA-Seq data analysis. Cactus contains multiple unique functions compared to other, similar pipelines, e.g. enrichment in chromatin states and ChIP-Seq binding sites.

  10. f

    Additional file 2: of VIPER: Visualization Pipeline for RNA-seq, a Snakemake...

    • springernature.figshare.com
    txt
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MacIntosh Cornwell; Mahesh Vangala; Len Taing; Zachary Herbert; Johannes Kรถster; Bo Li; Hanfei Sun; Taiwen Li; Jian Zhang; Xintao Qiu; Matthew Pun; Rinath Jeselsohn; Myles Brown; X. Liu; Henry Long (2023). Additional file 2: of VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6138290.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    figshare
    Authors
    MacIntosh Cornwell; Mahesh Vangala; Len Taing; Zachary Herbert; Johannes Kรถster; Bo Li; Hanfei Sun; Taiwen Li; Jian Zhang; Xintao Qiu; Matthew Pun; Rinath Jeselsohn; Myles Brown; X. Liu; Henry Long
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Metasheet Example (CSV 600 bytes)

  11. u

    RNAseq RAW DATA of bacterial interactions with avocado roots

    • portaldelainvestigacion.uma.es
    • figshare.com
    Updated 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cazorla, Francisco; Tienda, Sandra; Cazorla, Francisco; Tienda, Sandra (2023). RNAseq RAW DATA of bacterial interactions with avocado roots [Dataset]. https://portaldelainvestigacion.uma.es/documentos/67a9c7cf19544708f8c732b5
    Explore at:
    Dataset updated
    2023
    Authors
    Cazorla, Francisco; Tienda, Sandra; Cazorla, Francisco; Tienda, Sandra
    Description

    RNAseq comparing wt strain PcPCL1606 and the derivative mutant AdarB, defective in HPR production. RNA was extracted from the rhizosphere samples using a PowerSoil® RNA extraction kit (Qiagen Iberia S.L., Madrid, Spain) following the manufacturer's instructions and its amount was quantified using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). For the RNAseq experiment, the quantity and quality of RNA were verified by the Genomics and Ultrasequencing Service Unit (University of Malaga) and subsequently sequenced using NextSeq550 equipment (Illumina). The raw reads and their subsequent processing were carried out by the Centre for Supercomputing and Bioinnovation (University of Malaga). The bacterial RNAseq data analysis was performed based on a series of software packages adapted to the experimental model. The software components of the RNAseq analysis pipeline included analysis by SeqTrimNext (v.2.0.6) to remove low-quality reads, adapters, organular DNA and contaminant sequences; BOWTIE (v.2.2.9) to align reads to the genomic reference; Samtools (v. 0.1.19), a package of programs to deal directly with the alignment files, reading, writing, editing or viewing the alignment files in SAM/BAM format (http://www.htslib.org/); and TUXEDO tools (http://cole-trapnell-lab.github.io/cufflinks/manual/), used to estimate the aligned RNAseq reads in the different transcripts and estimate their abundance. The abundance of the transcripts was measured in fragments per kilobase of fragments of exon per million reads (fpkm). Once the transcripts and their corresponding estimated fpkm have been assembled, these transcripts were annotated with the known reference set of genes obtained from the database from the annotated reference file. This pipeline is a tool developed by the Andalusian Platform for Bioinformatics (PAB; http://www.scbi.uma.es/site/omics/bioinformatics) for the study of differential expression analysis using data of RNAseq on a genomic reference. The subsequent analysis of differential expression with a method analogous to differentially expressed sequences, and the graphical representation of the expression results was done using the 'cummeRbund' R package (v. 2.42.0). The array of reads in fpkm format generated will be used to obtain a list of differentially expressed genes that showed a p-value less than 0.05.NAseq comparing wt strain PcPCL1606 and the derivative mutant AdarB, defective in HPR production.

  12. d

    RNA-seq analysis of the transcriptome from Sulfur Deprivation Chlamydomonas...

    • datamed.org
    Updated May 2, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2014). RNA-seq analysis of the transcriptome from Sulfur Deprivation Chlamydomonas cells [Dataset]. https://datamed.org/display-item.php?repository=0006&id=5913bc0f5152c62a9fc24723&query=NIT2%20replete
    Explore at:
    Dataset updated
    May 2, 2014
    Description

    The Chlamydomonas reinhardtii transcriptome was characterized from nutrient-replete and sulfur-depleted wild-type and snrk2.1 mutant cells; the mutant is null for the regulatory serine-threonine kinase SNRK2.1, which is required for acclimation to sulfur deprivation. The transcriptome analyses involved microarray hybridization and RNA-seq technology; RT-qPCR evaluation of the data obtained by these techniques showed that RNA-seq is significantly more quantitative than microarray hybridizations. Sulfur-deprivation-responsive transcripts included those encoding proteins involved in sulfur acquisition and assimilation, recycling of sulfur-containing amino acids, synthesis of reduced sulfur metabolites and cofactors, and modification of cellular structures such as the cell wall and complexes associated with the photosynthetic apparatus. Moreover, the data suggest that cells deprived of sulfur favors accumulation of proteins with fewer sulfur-containing amino acids. Most of these sulfur-deprivation responses are controlled by the SNRK2.1 kinase. Furthermore, the snrk2.1 mutant exhibits a set of unique responses during both sulfur-replete and sulfur-depleted conditions that are not observed in wild-type cells. Many of these responses are likely to be elicited by singlet O2 accumulation in the mutant cells. The transcriptome results for the wild-type and mutant cells strongly suggest the occurrence of massive changes in cellular physiology and metabolism as the cells become depleted for sulfur, and reveal aspects of acclimation that are likely critical for cell survival. The three supplementary files GSE17970_supplemental_table_*.xls below include results of the differential expression analysis (expression estimates, fold changes and p-values), and different clusters of functionally related genes. Chlamydomonas strains used for this study were D66 (wt strain; nit2 cw15 mt+), ars11 (snrk2.1cw15mt+), 21gr (wt strain; nit5 mt-) and sac1 (sac1mt+). The ars11 strain was designated as the snrk2.1 mutant throughout since the lesion is in the SNRK2.1 gene. Cells were cultured under continuous light of ~60 μmol photon m-2s-1 at 23ºC in liquid and on solid Tris-Acetate-Phosphate (TAP) medium. To impose S deprivation, cells in mid-logarithmic growth phase were washed twice with liquid TAP medium without S (TAP-S), and equal numbers of cells were resuspended in TAP or TAP-S. Cell aliquots were collected for RNA isolation just prior to and 6 h after being transferred to TAP and TAP-S medium. Total RNA from wt (D66) and snrk2.1 mutant cells after 0 and 6 h of being transferred to -S medium were submitted to Illumina for sequencing using their proprietary Genome Analyzer. cDNA libraries were assembled according to the manufacturer’s RNA-seq protocol, loaded and sequenced as 35-mers in a total of 11 Solexa lanes. Raw image files were collected by the sequencer and analyzed using the standard Solexa pipeline.

  13. Data from: RNAseq analysis of the response of Arabidopsis thaliana to...

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Apr 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2025). RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight [Dataset]. https://catalog.data.gov/dataset/rnaseq-analysis-of-the-response-of-arabidopsis-thaliana-to-fractional-gravity-under-blue-l-5c55d
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Traveling to nearby extraterrestrial objects having a reduced gravity level (partial gravity) compared to Earth's gravity is becoming a realistic objective for space agencies. The use of plants as part of life support systems will require a better understanding of the interactions among plant growth responses including tropisms, under partial gravity conditions. Here, we present results from our latest space experiments on the ISS, in which seeds of Arabidopsis thaliana were germinated, and seedlings grew for six days under different gravity levels, namely micro-g, several intermediate partial-g levels, and 1g, and were subjected to irradiation with blue light for the last 48 hours. RNA was extracted from 20 samples for subsequent RNAseq analysis. Transcriptomic analysis was performed using the HISAT2-Stringtie-DESeq pipeline. Differentially expressed genes were further characterized for global responses using the GEDI tool, gene networks and for Gene Ontology (GO) enrichment.

  14. f

    Additional file 1 of NetSeekR: a network analysis pipeline for RNA-Seq time...

    • springernature.figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himangi Srivastava; Drew Ferrell; George V. Popescu (2023). Additional file 1 of NetSeekR: a network analysis pipeline for RNA-Seq time series data [Dataset]. http://doi.org/10.6084/m9.figshare.19090649.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Himangi Srivastava; Drew Ferrell; George V. Popescu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 1. Brief description of functions implemented in NetSeekR.

  15. Results of "Curare and GenExVis: A versatile toolkit for analyzing and...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Blumenkamp; Patrick Blumenkamp; Max Pfister; Sonja Diedrich; Karina Brinkrolf; Sebastian Jaenicke; Alexander Goesmann; Alexander Goesmann; Max Pfister; Sonja Diedrich; Karina Brinkrolf; Sebastian Jaenicke (2024). Results of "Curare and GenExVis: A versatile toolkit for analyzing and visualizing RNA-Seq data" [Dataset]. http://doi.org/10.5281/zenodo.10362480
    Explore at:
    zipAvailable download formats
    Dataset updated
    2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Patrick Blumenkamp; Patrick Blumenkamp; Max Pfister; Sonja Diedrich; Karina Brinkrolf; Sebastian Jaenicke; Alexander Goesmann; Alexander Goesmann; Max Pfister; Sonja Diedrich; Karina Brinkrolf; Sebastian Jaenicke
    Description

    Even though high-throughput transcriptome sequencing is routinely performed in many laboratories, computational analysis of such data remains a cumbersome process often executed manually, hence error-prone and lacking reproducibility. For corresponding data processing, we introduce Curare, an easy-to-use yet versatile workflow builder for analyzing high-throughput RNA-Seq data focusing on differential gene expression experiments. Data analysis with Curare is customizable and subdivided into preprocessing, quality control, mapping, and downstream analysis stages, providing multiple options for each step while ensuring the reproducibility of the workflow. For a fast and straightforward exploration and visualization of differential gene expression results, we provide the gene expression visualizer software GenExVis. GenExVis can create various charts and tables from simple gene expression tables and DESeq2 results without the requirement to upload data or install software packages.

  16. m

    Data from: RNA Sequencing-Based Single Sample Predictors of Molecular...

    • data.mendeley.com
    Updated Aug 19, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johan Vallon-Christersson (2022). RNA Sequencing-Based Single Sample Predictors of Molecular Subtype and Risk of Recurrence for Clinical Assessment of Early-Stage Breast Cancer [Dataset]. http://doi.org/10.17632/yzxtxn4nmd.2
    Explore at:
    Dataset updated
    Aug 19, 2022
    Authors
    Johan Vallon-Christersson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gene expression data and associated supplementary files from RNAseq of breast cancer samples from Staaf et al. npj Breast Cancer 2022 (source reference below). Library preparation for mRNA-sequencing was done by a stranded dUTP mRNA protocol or by Illumina stranded TruSeq mRNA protocol. Expression data (Fragments Per Kilobase per Million reads, FPKM) was generated by an analysis pipeline utilizing Hisat/StringTie with GRCh38 human genome primary assembly and GENCODE Release 27 transcripts/genes. Gene expression data is summarized on GENCODE gene identifier. Gene and transcript definitions and gene annotations are from GENCODE Release 27.

    Detailed description including material and methods for RNAseq, Hisat/StringTie analysis pipeline, and the development of the Single Sample Predictor (SSP) models for Breast Cancer is available in Staaf et al. npj Breast Cancer 2022 (source reference below).

    The developed SSP models are available as an R package available at GitHub (reference below).

  17. D

    RNA Sequencing Technologies Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). RNA Sequencing Technologies Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-rna-sequencing-technologies-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    RNA Sequencing Technologies Market Outlook



    The global RNA sequencing technologies market size was valued at $2.3 billion in 2023 and is poised to grow to $9.7 billion by 2032, exhibiting a robust CAGR of 16.9% during the forecast period. This impressive growth can be attributed to the increasing demand for personalized medicine and advancements in biotechnology, which have propelled the adoption of RNA sequencing technologies across various sectors.



    The primary growth factor driving the RNA sequencing technologies market is the increasing focus on personalized medicine. As healthcare moves towards more targeted and individualized treatment plans, RNA sequencing enables a deeper understanding of the genetic and molecular underpinnings of diseases. This, in turn, facilitates the development of more effective treatments and therapies tailored to individual patients. Additionally, technological advancements in sequencing methods and bioinformatics tools have significantly lowered the costs and increased the accuracy and efficiency of RNA sequencing, further boosting its adoption.



    Another significant growth factor is the rising prevalence of chronic diseases and conditions such as cancer, cardiovascular diseases, and neurological disorders. These complex diseases require detailed molecular and genetic profiling for effective diagnosis and treatment. RNA sequencing provides a comprehensive view of the transcriptome, making it an invaluable tool in the detection and understanding of disease mechanisms. This has led to increased investments in RNA sequencing applications by pharmaceutical and biotechnology companies, as well as academic and research institutions.



    Furthermore, the expanding scope of RNA sequencing in drug discovery and development is a crucial driver of market growth. By offering insights into gene expression and regulation, RNA sequencing helps identify potential drug targets and biomarkers, accelerating the drug development process. This has led to a surge in collaborative research efforts and partnerships between sequencing technology providers and pharmaceutical companies. As the demand for novel therapeutics continues to rise, the role of RNA sequencing in the drug discovery pipeline is expected to become even more significant.



    mRNA Sequencing has emerged as a pivotal component within the broader RNA sequencing technologies landscape. This method focuses on capturing the messenger RNA molecules present in a sample, providing insights into the actively expressed genes at any given moment. The precision of mRNA Sequencing allows researchers to explore the dynamic nature of gene expression, making it invaluable for understanding cellular responses to environmental changes, disease states, and developmental processes. As the demand for personalized medicine grows, mRNA Sequencing offers the potential to tailor treatments based on an individual's unique gene expression profile, thus enhancing therapeutic efficacy and minimizing adverse effects.



    Regionally, North America holds a dominant position in the RNA sequencing technologies market, attributed to the presence of major biotechnology firms and advanced research infrastructures. Additionally, favorable regulatory environments and substantial government funding for genomics research further support market growth in this region. However, the Asia Pacific region is anticipated to exhibit the highest CAGR during the forecast period, driven by increasing healthcare investments, growing awareness of personalized medicine, and a burgeoning biotech sector.



    Technology Analysis



    Single-cell RNA Sequencing Analysis



    Single-cell RNA sequencing (scRNA-seq) is a powerful technology that enables the analysis of gene expression at the individual cell level, providing a high-resolution view of cellular heterogeneity. This technology has revolutionized our understanding of complex biological systems, including cancer, immune responses, and developmental biology. The ability to profile thousands of cells simultaneously has led to significant advancements in identifying rare cell populations and understanding cellular functions within tissues. As a result, scRNA-seq is increasingly being adopted by academic and research institutions for basic and translational research.



    The market for scRNA-seq is driven by the continuous innovations in sequencing platforms and data analysis tools, which have made the technology more

  18. n

    Data from: LsRTDv1: A reference transcript dataset for accurate...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katherine Denby; Mehmet Fatih Kara; Wenbin Guo; Runxuan Zhang (2024). LsRTDv1: A reference transcript dataset for accurate transcript-specific expression analysis in lettuce [Dataset]. http://doi.org/10.5061/dryad.xwdbrv1m8
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 29, 2024
    Dataset provided by
    James Hutton Institute
    University of York
    Authors
    Katherine Denby; Mehmet Fatih Kara; Wenbin Guo; Runxuan Zhang
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Accurate quantification of gene and transcript-specific expression, with the underlying knowledge of precise transcript isoforms, is crucial to understanding many biological processes. Analysis of RNA sequencing data has benefited from the development of alignment-free algorithms which enhance the precision and speed of expression analysis. However, such algorithms require a reference transcriptome. Here we present a reference transcript dataset (LsRTDv1) for lettuce, combining long- and short-read sequencing with publicly available transcriptome annotations, and filtering to keep only transcripts with high-confidence splice junctions and transcriptional start and end sites. LsRTDv1 is a valuable resource for the investigation of transcriptional and alternative splicing regulation in lettuce. Methods We generated a lettuce Reference Transcript Dataset (LsRTDv1) by integrating transcript assemblies from short- and long-read RNA sequencing data with existing lettuce genome annotations. RNA sequencing data was generated from 23 different lettuce samples capturing different tissues, ages of plant and treatments. The 23 samples, all from Lactuca sativa cv. Saladin (synonymous with cv. Salinas) were combined equally into 7 samples prior to sequencing. Short-read assembly The RNA-seq reads of the seven pooled samples were pre-processed with Fastp (Chen et al., 2018) to remove adapters and filter low-quality reads (quality score <20, length <30). Trimmed reads were mapped to the latest lettuce reference genome assembly in NCBI (Lsat_Salinas_v11) using STAR aligner in the 2-pass mode to increase the mapping sensitivity at splice junctions (SJs)(Dobin and Gingeras, 2015). Mismatch was set to 1 with minimum and maximum intron sizes of 60 and 15,000 bp respectively. Two transcript assemblers, StringTie (Pertea et al., 2015) and Scallop (Shao and Kingsford, 2017), were used to assemble transcripts for each sample. The assemblies were then merged and refined using RTDmaker (https://github.com/anonconda/RTDmaker) to remove low-quality transcripts, including redundant transcripts with identical intron combinations to longer transcripts, fragmented transcripts with length <70% of gene length, transcripts with non-canonical SJs, transcripts with SJs only supported by <5 spliced reads in <2 samples and low expressed transcripts with <1 transcript per million reads (TPM) in <2 samples. Long-read assembly We employed the IsoSeq pipeline (https://github.com/PacificBiosciences/IsoSeq) to pre-process the Iso-seq data from the seven samples. The CCS method was used to generate circular consensus sequences (CCS) from raw subreads and reads with minimum predicted accuracy <90% were discarded (--min-rq=0.9). Barcodes associated with the CCS reads were eliminated using the lima method. To further refine the reads, Isoseq3 was applied to trim poly(A) tails and identify and remove concatemers. The output of full-length, non-concatemer (FLNC) reads was mapped to the reference genome using Minimap2 (Li, 2018). TAMA-collapse was used to collapse redundant transcript models in each sample with variation at the 5’ and 3’ ends and at SJs not allowed (-a = 0, -m = 0 and -z = 0) to ensure high accuracy of boundaries. Reads with errors within the 10 bp up- or down-stream of a SJ were removed. TAMA-merge was used to merge transcript models from the seven samples (Kuo et al., 2020). To improve the quality of the assembly, we implemented well-established methods for SJ and transcript start site (TSS) and end site (TES) analyses previously used for Arabidopsis AtRTD3 and barley BaRTv2 (Zhang et al., 2022b; Coulter et al., 2022). We removed low-quality transcripts that exhibited non-canonical SJs and low quality SJs unless they were also present in the short-read assembly. We applied a binomial test to distinguish high-confidence TSS and TES with a false discovery rate (FDR) <0.05. For genes with limited read support, statistical testing becomes challenging, hence we also kept TSS/TES if they were supported by at least 2 Iso-seq reads. Redundancy merge was applied to transcripts if they only differed ±50 nucleotides at their TSS/TES. In addition, transcripts only supported by a single Iso-seq read were removed from the final dataset. Integration of multiple annotations We integrated four transcript annotations: the long-read assembly, short-read assembly and two versions of Lsat_Salinas_v11 genome annotations GenBank (GCA_002870075.4) and RefSeq (GCF_002870075.4). The Iso-seq long-read assembly served as the reliable backbone, while the other three annotations were incorporated in a step-wise manner to improve the RTD completeness. Firstly, the transcripts in the short-read assembly that introduce novel SJs and/or novel gene loci were integrated into the long-read assembly. Subsequently, we added transcripts from GenBank and RefSeq annotations that contributed novel SJs or gene loci to build the lettuce RTD (LsRTDv1). In cases where two transcripts from GenBank and RefSeq had identical SJ combinations or were mono-exonic transcripts with overlapping regions exceeding 30% of both transcripts, we collapsed them to a single transcript, and the longest TSS and TES were used as the start and end point of the collapsed transcript. In LsRTDv1, the overlapped transcripts were assigned the same gene ID. However, if a set of overlapped transcripts entirely resided within the intron region of other transcripts, they were treated as intronic transcripts and assigned with a different gene ID. Where the overlapped transcripts can be divided into multiple groups and the adjacent groups overlapped less than 5% of the group lengths, they were assigned separate gene IDs.

  19. Gene expression count data from human post-mortem spinal cord

    • zenodo.org
    application/gzip
    Updated Mar 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jack Humphrey; Jack Humphrey (2022). Gene expression count data from human post-mortem spinal cord [Dataset]. http://doi.org/10.5281/zenodo.6385747
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 26, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jack Humphrey; Jack Humphrey
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gene expression data from human post-mortem tissue for three spinal cord sections (cervical, thoracic and lumbar) from amyotrophic lateral sclerosis (ALS) patients and non-neurological disease controls. RNA sequencing performed as part of the New York Genome Center ALS Consortium.

    Analysis workbooks: https://jackhump.github.io/ALS_SpinalCord_QTLs/

    Preprint describing results: https://www.medrxiv.org/content/10.1101/2021.08.31.21262682v1

    Sample sizes:

    Region

    Control

    ALS

    Cervical

    35

    139

    Thoracic

    10

    42

    Lumbar

    32

    122

    Library preparation

    RNA was extracted from flash-frozen postmortem tissue using TRIzol (Thermo Fisher Scientific) chloroform, followed by column purification (RNeasy Minikit, QIAGEN). RNA integrity number (RIN) was assessed on a Bioanalyzer (Agilent Technologies). RNA-Seq libraries were prepared from 500ng total RNA using the KAPA Stranded RNA-Seq Kit with RiboErase (KAPA Biosystems) for rRNA depletion and Illumina-compatible indexes (NEXTflex RNA-Seq Barcodes, NOVA-512915, PerkinElmer, and IDT for Illumina TruSeq UD Indexes, 20022370). Pooled libraries (average insert size: 375 bp) passing the quality criteria were sequenced either on an Illumina HiSeq 2500 (125 bp paired end) or an Illumina NovaSeq (100 bp paired-end). The samples had a median sequencing depth of 42 million read pairs, with a range between 16 and 167 million read pairs.

    Data processing

    Samples were uniformly processed using RAPiD-nf, an efficient RNA-Seq processing pipeline implemented in the NextFlow framework. Following adapter trimming with Trimmomatic (version 0.36), all samples were aligned to the hg38 build (GRCh38.primary_assembly) of the human reference genome using STAR (2.7.2a), with indexes created from GENCODE, version 30. Gene expression was quantified using RSEM (1.3.1) using GENCODE v30. Quality control was performed using SAMtools and Picard, and the results were collated using MultiQC. Various technical metrics for sequencing quality control are provided in the metadata. Estimated read counts and normalised transcripts per million (TPM) matrices provided for each tissue.

    Provided data:

    gencode.v30.gene_meta.tsv.gz - tab separated table with columns "genename", the HGNC gene symbol, and "geneid" the Ensembl ID, as set in the GENCODE v30 comprehensive annotation.

    For {tissue} in Cervical_Spinal_Cord, Thoracic_Spinal_Cord, Lumbar_Spinal_Cord:

    {tissue}_metadata.tsv.gz - metadata describing each sample. Each row describes a sample. Descriptions of each column below.

    {tissue}_gene_tpm.tsv.gz - the normalised TPM values from RSEM for all 58,884 genes in GENCODE v30. Each row describes a gene and each column describes a sample.

    {tissue}_gene_counts.tsv.gz - the estimated read counts from RSEM for all 58,884 genes in GENCODE v30. Each row describes a gene and each column describes a sample.

    Metadata Column Description

    rna_id - de-identified sample ID for each unique RNA-seq sample

    dna_id - de-identified donor ID for each patient enrolled in the study

    site_id - de-identified site name for each contributing site

    tissue - name of tissue/region

    age_rounded - age at death, rounded to nearest decade

    sex - biological sex of donor

    subject_group - long form disease group

    disease - short form disease group

    site_of_motor_onset - for ALS donors, where did symptoms start?

    disease_duration - for ALS donors, how long did donor live with disease?

    mutations - any known ALS gene mutations

    library_prep - type of library preparation method used

    seq_platform - sequencing platform used for sequencing

    rin - RNA integrity number, 0-10

    c9orf72_repeat_size - estimated C9orf72 repeat expansion size

    gPC1 - gPC5 - principal component of genetic ancestry from whole genome sequencing

    Remaining metadata columns are from Picard - see here: http://broadinstitute.github.io/picard/picard-metric-definitions.html#RnaSeqMetrics

  20. f

    Raw and processed (filtered and annotated) scRNAseq data

    • figshare.com
    zip
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabrielle Leclercq-Cohen; Sabrina Danilin; Llucia Alberti-Servera; Stephan Schmeing; Hélène Haegel; Sina Nassiri; Marina Bacac (2023). Raw and processed (filtered and annotated) scRNAseq data [Dataset]. http://doi.org/10.6084/m9.figshare.23499192.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    figshare
    Authors
    Gabrielle Leclercq-Cohen; Sabrina Danilin; Llucia Alberti-Servera; Stephan Schmeing; Hélène Haegel; Sina Nassiri; Marina Bacac
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single cell RNA-seq data generated and reported as part of the manuscript entitled "Dissecting the mechanisms underlying the Cytokine Release Syndrome (CRS) mediated by T Cell Bispecific Antibodies" by Leclercq-Cohen et al 2023. Raw and processed (filtered and annotated) data are provided as AnnData objects which can be directly ingested to reproduce the findings of the paper or for ab initio data reuse: 1- raw.zip provides concatenated raw/unfiltered counts for the 20 samples in the standard Market Exchange Format (MEX) format. 2- 230330_sw_besca2_LowFil_raw.h5ad contains filtered cells and raw counts in the HDF5 format. 3- 221124_sw_besca2_LowFil.annotated.h5ad contains filtered cells and log normalized counts, along with cell type annotation in the HDF5 format.

    scRNAseq data generation: Whole blood from 4 donors was treated with 0.2 μg/mL CD20-TCB, or incubated in the absence of CD20- TCB. At baseline (before addition of TCB) and assay endpoints (2, 4, 6, and 20 hrs), blood was collected for total leukocyte isolation using EasySepTM red blood cell depletion reagent (Stemcell). Briefly, cells were counted and processed for single cell RNA sequencing using the BD Rhapsody platform. To load several samples on a single BD Rhapsody cartridge, sample cells were labelled with sample tags (BD Human Single-Cell Multiplexing Kit) following the manufacturer’s protocol prior to pooling. Briefly, 1x106 cells from each sample were re-suspended in 180 μL FBS Stain Buffer (BD, PharMingen) and sample tags were added to the respective samples and incubated for 20 min at RT. After incubation, 2 successive washes were performed by addition of 2 mL stain buffer and centrifugation for 5 min at 300 g. Cells were then re- suspended in 620 μL cold BD Sample Buffer, stained with 3.1 μL of both 2 mM Calcein AM (Thermo Fisher Scientific) and 0.3 mM Draq7 (BD Biosciences) and finally counted on the BD Rhapsody scanner. Samples were then diluted and/or pooled equally in 650 μL cold BD Sample Buffer. The BD Rhapsody cartridges were then loaded with up to 40 000 – 50 000 cells. Single cells were isolated using Single-Cell Capture and cDNA Synthesis with the BD Rhapsody Express Single-Cell Analysis System according to the manufacturer’s recommendations (BD Biosciences). cDNA libraries were prepared using the Whole Transcriptome Analysis Amplification Kit following the BD Rhapsody System mRNA Whole Transcriptome Analysis (WTA) and Sample Tag Library Preparation Protocol (BD Biosciences). Indexed WTA and sample tags libraries were quantified and quality controlled on the Qubit Fluorometer using the Qubit dsDNA HS Assay, and on the Agilent 2100 Bioanalyzer system using the Agilent High Sensitivity DNA Kit. Sequencing was performed on a Novaseq 6000 (Illumina) in paired-end mode (64-8- 58) with Novaseq6000 S2 v1 or Novaseq6000 SP v1.5 reagents kits (100 cycles). scRNAseq data analysis: Sequencing data was processed using the BD Rhapsody Analysis pipeline (v 1.0 https://www.bd.com/documents/guides/user-guides/GMX_BD-Rhapsody-genomics- informatics_UG_EN.pdf) on the Seven Bridges Genomics platform. Briefly, read pairs with low sequencing quality were first removed and the cell label and UMI identified for further quality check and filtering. Valid reads were then mapped to the human reference genome (GRCh38-PhiX-gencodev29) using the aligner Bowtie2 v2.2.9, and reads with the same cell label, same UMI sequence and same gene were collapsed into a single raw molecule while undergoing further error correction and quality checks. Cell labels were filtered with a multi-step algorithm to distinguish those associated with putative cells from those associated with noise. After determining the putative cells, each cell was assigned to the sample of origin through the sample tag (only for cartridges with multiplex loading). Finally, the single-cell gene expression matrices were generated and a metrics summary was provided. After pre-processing with BD’s pipeline, the count matrices and metadata of each sample were aggregated into a single adata object and loaded into the besca v2.3 pipeline for the single cell RNA sequencing analysis (43). First, we filtered low quality cells with less than 200 genes, less than 500 counts or more than 30% of mitochondrial reads. This permissive filtering was used in order to preserve the neutrophils. We further excluded potential multiplets (cells with more than 5,000 genes or 20,000 counts), and genes expressed in less than 30 cells. Normalization, log-transformed UMI counts per 10,000 reads [log(CP10K+1)], was applied before downstream analysis. After normalization, technical variance was removed by regressing out the effects of total UMI counts and percentage of mitochondrial reads, and gene expression was scaled. The 2,507 most variable genes (having a minimum mean expression of 0.0125, a maximum mean expression of 3 and a minimum dispersion of 0.5) were used for principal component analysis. Finally, the first 50 PCs were used as input for calculating the 10 nearest neighbours and the neighbourhood graph was then embedded into the two-dimensional space using the UMAP algorithm at a resolution of 2. Cell type annotation was performed using the Sig-annot semi-automated besca module, which is a signature- based hierarchical cell annotation method. The used signatures, configuration and nomenclature files can be found at https://github.com/bedapub/besca/tree/master/besca/datasets. For more details, please refer to the publication.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Melih Agraz; Dincer Goksuluk; Peng Zhang; Bum-Rak Choi; Richard T. Clements; Gaurav Choudhary; George Em Karniadakis (2024). DataSheet1_ML-GAP: machine learning-enhanced genomic analysis pipeline using autoencoders and data augmentation.PDF [Dataset]. http://doi.org/10.3389/fgene.2024.1442759.s001

DataSheet1_ML-GAP: machine learning-enhanced genomic analysis pipeline using autoencoders and data augmentation.PDF

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
Sep 27, 2024
Dataset provided by
Frontiers
Authors
Melih Agraz; Dincer Goksuluk; Peng Zhang; Bum-Rak Choi; Richard T. Clements; Gaurav Choudhary; George Em Karniadakis
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

IntroductionThe advent of RNA sequencing (RNA-Seq) has significantly advanced our understanding of the transcriptomic landscape, revealing intricate gene expression patterns across biological states and conditions. However, the complexity and volume of RNA-Seq data pose challenges in identifying differentially expressed genes (DEGs), critical for understanding the molecular basis of diseases like cancer.MethodsWe introduce a novel Machine Learning-Enhanced Genomic Data Analysis Pipeline (ML-GAP) that incorporates autoencoders and innovative data augmentation strategies, notably the MixUp method, to overcome these challenges. By creating synthetic training examples through a linear combination of input pairs and their labels, MixUp significantly enhances the model’s ability to generalize from the training data to unseen examples.ResultsOur results demonstrate the ML-GAP’s superiority in accuracy, efficiency, and insights, particularly crediting the MixUp method for its substantial contribution to the pipeline’s effectiveness, advancing greatly genomic data analysis and setting a new standard in the field.DiscussionThis, in turn, suggests that ML-GAP has the potential to perform more accurate detection of DEGs but also offers new avenues for therapeutic intervention and research. By integrating explainable artificial intelligence (XAI) techniques, ML-GAP ensures a transparent and interpretable analysis, highlighting the significance of identified genetic markers.

Search
Clear search
Close search
Google apps
Main menu