Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets produced during the validation of CWL-based pipelines, designed for the analysis of data from RNA-Seq, ChIP-Seq and germline variant calling experiments. Specifically, the workflows were tested using publicly available High-throughput (HTS) data from published studies on Chronic Lymphocytic Leukemia (CLL) (accession numbers: E-MTAB-6962, GSE115772) and Genome in a Bottle (GIAB) project samples (accession numbers: SRR6794144, SRR22476789, SRR22476790, SRR22476791).
The supporting data include:
Differential transcript and gene expression results produced during the analysis with the CWL-based RNA-Seq pipeline
Bigwig and narrowPeak files, differential binding results, table of consensus peaks and read counts of EZH2 and H3K27me3, produced during the analysis with the CWL-based ChIP-Seq pipeline
VCF files containing the detected and filtered variants, along with the respective hap.py () results regarding comparisons against the GIAB golden standard truth sets for both CWL-based germline variant calling pipelines
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 3: Table S1. Available configuration profiles. Configuration files exist under the SPEAQeasy/conf directory. Configuration profiles exist for SGE and SLURM clusters, as well as local execution on a Linux machine. These profiles can be customized for specific clusters, such as the JHPCE configuration file jhpce.config, which runs on an SGE cluster. The file a user chooses also depends on whether software dependencies are managed with docker, or are installed locally. Table S2. SPEAQeasy output files. Table of intermediary outputs generated by SPEAQeasy. These do not include the major output files of interest (Fig. 2), but other miscellaneous outputs from each processing step. In the Filename column, brackets denote one or more values dependent on a relevant variable; for example, the files [sample_name]_process_trace.log refer to a set of several files, each named distinctly according to the sample associated with the particular file. An asterisk represents a wildcard matching more than one file, when individual file names may depend on the experiment. For example, [sample_name]_trimmed*.fastq could refer to sample1_trimmed_1.fastq and sample1_trimmed_2.fastq. The next columns provide the directory containing each given file, relative to the output folder, and a description of the files’ content, respectively. Table S3. Quality metrics recorded in SPEAQeasy outputs. One of the major pipeline outputs is a comma-separated values (CSV) file where fields (columns) are different quality metrics, and each line (row) is associated with one sample. A list of the exact field names and their descriptions is given above. Table S4. SPEAQeasy-example differential expression and gene ontology results. (A) Differential expression results using the subset of BipSeq data analyzed in http://research.libd.org/SPEAQeasy-example/ . (B) Gene ontology enrichment results from the genes with a p-value
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of four samples of GEO accession GSE119855 with the IBU RNA-seq pipeline
Background: mRNA interactions with each other and other signaling molecules define different biological pathways and functions. Researchers have been investigating various tools to analyze these types of interactions. In particular gene co-expression network methods have proved useful in finding and analyzing these molecular interactions. Many different analytical pipelines to identify these interactions networks have been proposed with the aim of identifying an optimal partition of the network where the individual modules are neither too small to make any general inference or too large to be biologically interpretable. Results: In this study we propose a new pipeline to perform gene co-expression network analysis. The proposed pipeline uses WGCNA a widely used software to perform different aspects of gene co-expression network analysis and modularity maximization algorithm to analyze novel RNA-Seq data to understand the effects of low-dose 56Fe ion irradiation on the formation of hepatocellular carcinoma in mice. The network results along with experimental validation show that using WGCNA combined with Modularity provide a more biologically interpretable network in our dataset. Our pipeline showed better performance than the existing clustering algorithm in WGCNA in finding modules and identified a module with mitochondrial subunits that are supported by mitochondrial complex assay. Conclusions: We present a pipeline that can reduce the problem of parameter selection with the existing algorithm in WGCNA for comparable RNA-Seq datasets which may assist in future research to discover novel mRNA interactions and their downstream molecular effects. C57BL16 males were placed into 2 treatment groups and received the following irradiation treatments at Brookhaven National Laboratories (Long Island NY): 600 MeV/n 56Fe (0.2 Gy) and no irradiation. Left liver lobes were collected at 30 60 120 270 and 360 days post-irradiation flash frozen and stored at -80 xc2 xb0C until they could be processed for RNA-Seq. Livers were sampled by taking two 40-micron thick slices using a cryotome at -20 xc2 xb0C. This allowed multiple sampling of the tissue without the tissue going through multiple freeze/thaw cycles. Total RNA was isolated from the liver slices using RNAqueousTM Total RNA Isolation Kit (ThermoFisher Scientific Waltham MA) and rRNA was removed via Ribo-ZeroTM rRNA Removal Kit (Illumina San Diego CA) prior to library preparation with the Illumina TruSeq RNA Library kit. Samples were sequenced in a paired-end 50 base format on an Illumina HiSeq 1500. Reads were aligned to the mouse GRCm38 reference genome using the STAR alignment program version 2.5.3a with the recommended ENCODE options. The -quantMode GeneCounts option was used to obtain read counts per gene based on the Gencode release M14 annotation file. Total number of reads used in analysis varies between 23-35 millions of reads.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 2: Figure S2: SPEAQeasy logs tracing computational steps by sample. To aid transparency and greatly simplify the source of execution errors, SPEAQeasy automatically generates logs with several pieces of information for every sample. In order of submission, the name of each Nextflow process is printed, along with (1) the working directory: where all relevant files are present, (2) the exit code: a standard indication of whether the process succeeded or how it failed, (3) a list of the specific commands run during the given process. Above is a snapshot of the top of an example log
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This workflow adapts the approach and parameter settings of Trans-Omics for precision Medicine (TOPMed). The RNA-seq pipeline originated from the Broad Institute. There are in total five steps in the workflow starting from:
Read alignment using STAR which produces aligned BAM files including the Genome BAM and Transcriptome BAM.
The Genome BAM file is processed using Picard MarkDuplicates. producing an updated BAM file containing information on duplicate reads (such reads can indicate biased interpretation).
SAMtools index is then employed to generate an index for the BAM file, in preparation for the next step.
The indexed BAM file is processed further with RNA-SeQC which takes the BAM file, human genome reference sequence and Gene Transfer Format (GTF) file as inputs to generate transcriptome-level expression quantifications and standard quality control metrics.
In parallel with transcript quantification, isoform expression levels are quantified by RSEM. This step depends only on the output of the STAR tool, and additional RSEM reference sequences.
For testing and analysis, the workflow author provided example data created by down-sampling the read files of a TOPMed public access data. Chromosome 12 was extracted from the Homo Sapien Assembly 38 reference sequence and provided by the workflow authors. The required GTF and RSEM reference data files are also provided. The workflow is well-documented with a detailed set of instructions of the steps performed to down-sample the data are also provided for transparency. The availability of example input data, use of containerization for underlying software and detailed documentation are important factors in choosing this specific CWL workflow for CWLProv evaluation.
This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwl
Steps to reproduce
To build the research object again, use Python 3 on macOS. Built with:
Processor 2.8GHz Intel Core i7
Memory: 16GB
OS: macOS High Sierra, Version 10.13.3
Storage: 250GB
Install cwltool
pip3 install cwltool==1.0.20180912090223
Install git lfs The data download with the git repository requires the installation of Git lfs: https://www.atlassian.com/git/tutorials/git-lfs#installing-git-lfs
Get the data and make the analysis environment ready:
git clone https://github.com/FarahZKhan/cwl_workflows.git cd cwl_workflows/ git checkout CWLProvTesting ./topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/download_examples.sh
Run the following commands to create the CWLProv Research Object:
cwltool --provenance rnaseqwf_0.6.0_linux --tmp-outdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp --tmpdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp topmed-workflows/TOPMed_RNAseq_pipeline/rnaseq_pipeline_fastq.cwl topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/Dockstore.json
zip -r rnaseqwf_0.5.0_mac.zip rnaseqwf_0.5.0_mac sha256sum rnaseqwf_0.5.0_mac.zip > rnaseqwf_0.5.0_mac_mac.zip.sha256
The https://github.com/FarahZKhan/cwl_workflows repository is a frozen snapshot from https://github.com/heliumdatacommons/TOPMed_RNAseq_CWL commit 027e8af41b906173aafdb791351fb29efc044120
RNA-seq gene count datasets built using the raw data from 18 different studies. The raw sequencing data (.fastq files) were processed with Myrna to obtain tables of counts for each gene. For ease of statistical analysis, they combined each count table with sample phenotype data to form an R object of class ExpressionSet. The count tables, ExpressionSets, and phenotype tables are ready to use and freely available. By taking care of several preprocessing steps and combining many datasets into one easily-accessible website, we make finding and analyzing RNA-seq data considerably more straightforward.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 1. Figure S1: Expected vs. Actual ERCC concentration. SPEAQeasy produces plots for each sample, for easy visual comparison of expected ERCC transcript abundance with the kallisto-measured concentration.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data provided here are part of a Galaxy Training Network tutorial that analyzes RNA-Seq data from a study published by Brooks et al. 2011 to identify genes and exons that are regulated by Pasilla gene.
The root apex is an important section of the plant root involved in environmental sensing and cellular development. Analyzing the gene profile of root apex in diverse environments is important and challenging especially when the samples are limiting and precious such as in spaceflight. The feasibility of using tiny root sections for transcriptome analysis was examined in this study. To understand the gene expression profiles of the root apex Arabidopsis thaliana Col-0 roots were sectioned into Zone-I (0.5 mm root cap and meristematic zone) and Zone-II (1.5 mm transition elongation and growth terminating zone). Gene expression was analyzed using microarray and RNA seq. Both the techniques arrays and RNA-Seq identified 4180 common genes as differentially expressed (with > two-fold changes) between the zones. In addition 771 unique genes and 19 novel TARs were identified by RNA-Seq as differentially expressed which were not detected in the arrays. Single root tip zones can be used for full transcriptome analysis; further the root apex zones are functionally very distinct from each other. RNA-Seq provided novel information about the transcripts compared to the arrays. These data will help optimize transcriptome techniques for dealing with small rare samples.
A critical task in high throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data is discrete in nature; therefore with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not previously been performed. RESULTS: We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors, and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used RT-PCR and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM) performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability. RNA-Seq of mouse retinal RNA, as described.
NGS-Based Rna-Seq Market Size 2024-2028
The NGS-based RNA-seq market size is forecast to increase by USD 6.66 billion, at a CAGR of 20.52% between 2023 and 2028.
The market is witnessing significant growth, driven by the increased adoption of next-generation sequencing (NGS) methods for RNA-Seq analysis. The advanced capabilities of NGS techniques, such as high-throughput, cost-effectiveness, and improved accuracy, have made them the preferred choice for researchers and clinicians in various fields, including genomics, transcriptomics, and personalized medicine. However, the market faces challenges, primarily from the lack of clinical validation on direct-to-consumer genetic tests. As the use of NGS technology in consumer applications expands, ensuring the accuracy and reliability of results becomes crucial.
The absence of standardized protocols and regulatory oversight in this area poses a significant challenge to market growth and trust. Companies seeking to capitalize on market opportunities must focus on addressing these challenges through collaborations, partnerships, and investments in research and development to ensure the clinical validity and reliability of their NGS-based RNA-Seq offerings.
What will be the Size of the NGS-based RNA-Seq market during the forecast period?
Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
Request Free Sample
The market continues to evolve, driven by advancements in NGS technology and its applications across various sectors. Spatial transcriptomics, a novel approach to studying gene expression in its spatial context, is gaining traction in disease research and precision medicine. Splice junction detection, a critical component of RNA-seq data analysis, enhances the accuracy of gene expression profiling and differential gene expression studies. Cloud computing plays a pivotal role in handling the massive amounts of data generated by NGS platforms, enabling real-time data analysis and storage. Enrichment analysis, gene ontology, and pathway analysis facilitate the interpretation of RNA-seq data, while data normalization and quality control ensure the reliability of results.
Precision medicine and personalized therapy are key applications of RNA-seq, with single-cell RNA-seq offering unprecedented insights into the complexities of gene expression at the single-cell level. Read alignment and variant calling are essential steps in RNA-seq data analysis, while bioinformatics pipelines and RNA-seq software streamline the process. NGS technology is revolutionizing drug discovery by enabling the identification of biomarkers and gene fusion detection in various diseases, including cancer and neurological disorders. RNA-seq is also finding applications in infectious diseases, microbiome analysis, environmental monitoring, agricultural genomics, and forensic science. Sequencing costs are decreasing, making RNA-seq more accessible to researchers and clinicians.
The ongoing development of sequencing platforms, library preparation, and sample preparation kits continues to drive innovation in the field. The dynamic nature of the market ensures that it remains a vibrant and evolving field, with ongoing research and development in areas such as data visualization, clinical trials, and sequencing depth.
How is this NGS-based RNA-Seq industry segmented?
The NGS-based RNA-seq industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
End-user
Acamedic and research centers
Clinical research
Pharma companies
Hospitals
Technology
Sequencing by synthesis
Ion semiconductor sequencing
Single-molecule real-time sequencing
Others
Geography
North America
US
Europe
Germany
UK
APAC
China
Singapore
Rest of World (ROW)
.
By End-user Insights
The acamedic and research centers segment is estimated to witness significant growth during the forecast period.
The global next-generation sequencing (NGS) market for RNA sequencing (RNA-Seq) is primarily driven by academic and research institutions, including those from universities, research institutes, government entities, biotechnology organizations, and pharmaceutical companies. These institutions utilize NGS technology for various research applications, such as whole-genome sequencing, epigenetics, and emerging fields like agrigenomics and animal research, to enhance crop yield and nutritional composition. NGS-based RNA-Seq plays a pivotal role in translational research, with significant investments from both private and public organizations fueling its growth. The technology is instrumental in disease research, enabling the identification
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 8: Table S7. List of genes differentially expressed and relative Fold Changes identify by STAR-featureCounts-edgeR pipeline.
We report cytokine specific changes in gene expression in the human neutrophil transcriptome using TNF-alpha and GM-CSF stimulation of healthy neutrophils Healthy human neutrophils were stimulated with TNF-alpha or GM-CSF for 1h in vitro. RNA was analysed by SOLiD and Illumina sequencing. RNA from one biological donor was sequenced on both platforms, and two different biological donors were sequenced by Illumina.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global RNA-Seq market is anticipated to reach a value of XXX million by 2033, expanding at a CAGR of XX% during the forecast period of 2025-2033. The market is primarily driven by the increasing prevalence of cancer and other chronic diseases, coupled with the advancements in RNA sequencing technologies. RNA-Seq is a high-throughput sequencing technique that allows researchers to study the expression of all RNA molecules in a cell or tissue sample. This information can be used to identify biomarkers for diseases, develop new therapies, and understand the mechanisms of gene regulation. The key market trends include the growing adoption of next-generation sequencing (NGS) platforms, the development of new RNA-Seq library preparation methods, and the increasing availability of bioinformatics tools. The major players in the RNA-Seq market include Thermo Fisher Scientific, Illumina, BGI, PacBio, Genewiz, Macrogen, LabCorp, Roche, Qiagen, Eurofins, Novo Gene, Berry Genomics, LC Sciences, Canopy Biosciences, Macrogen, and Hologic. The market is fragmented, with the top players accounting for a significant share. The market is expected to witness significant growth in the coming years, driven by the factors mentioned above.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results obtained from transcriptional profiling of genes of the isolated yeast strains using RNAseq analysis. Excel file with differential expression of genes and counts per million of yeast genes for the studied strains
RNAseq comparing wt strain PcPCL1606 and the derivative mutant AdarB, defective in HPR production. RNA was extracted from the rhizosphere samples using a PowerSoil® RNA extraction kit (Qiagen Iberia S.L., Madrid, Spain) following the manufacturer's instructions and its amount was quantified using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). For the RNAseq experiment, the quantity and quality of RNA were verified by the Genomics and Ultrasequencing Service Unit (University of Malaga) and subsequently sequenced using NextSeq550 equipment (Illumina). The raw reads and their subsequent processing were carried out by the Centre for Supercomputing and Bioinnovation (University of Malaga). The bacterial RNAseq data analysis was performed based on a series of software packages adapted to the experimental model. The software components of the RNAseq analysis pipeline included analysis by SeqTrimNext (v.2.0.6) to remove low-quality reads, adapters, organular DNA and contaminant sequences; BOWTIE (v.2.2.9) to align reads to the genomic reference; Samtools (v. 0.1.19), a package of programs to deal directly with the alignment files, reading, writing, editing or viewing the alignment files in SAM/BAM format (http://www.htslib.org/); and TUXEDO tools (http://cole-trapnell-lab.github.io/cufflinks/manual/), used to estimate the aligned RNAseq reads in the different transcripts and estimate their abundance. The abundance of the transcripts was measured in fragments per kilobase of fragments of exon per million reads (fpkm). Once the transcripts and their corresponding estimated fpkm have been assembled, these transcripts were annotated with the known reference set of genes obtained from the database from the annotated reference file. This pipeline is a tool developed by the Andalusian Platform for Bioinformatics (PAB; http://www.scbi.uma.es/site/omics/bioinformatics) for the study of differential expression analysis using data of RNAseq on a genomic reference. The subsequent analysis of differential expression with a method analogous to differentially expressed sequences, and the graphical representation of the expression results was done using the 'cummeRbund' R package (v. 2.42.0). The array of reads in fpkm format generated will be used to obtain a list of differentially expressed genes that showed a p-value less than 0.05.NAseq comparing wt strain PcPCL1606 and the derivative mutant AdarB, defective in HPR production.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Space environment is suspected to generate reactive oxygen species (ROS) and induce oxidative stress in plants however little is known about the gene expression of ROS gene network in plants grown in long-term space flight. RNA-Seq was used to define the large-scale gene expression profiles of Mizuna harvested after 27 days cultivation in the international space station to understand the molecular response and adaptation to space environment.Results: Total reads of transcripts from the Mizuna grown in the international space station as well as on the ground by RNA-Seq using next generation sequencing technology showed 8,258 and 14,170 transcripts up- and down-regulated in the space-grown Mizuna respectively when compared with those from the ground-grown Mizuna. A total of 20 in 32 ROS oxidative marker genes were up-regulated including high expression of 4 hallmarks and preferentially expressed gene associated with ROS-scavenging genes was thioredoxin glutaredoxin and alternative oxidase genes. In the transcription factors of ROS gene network MEKK1-MKK4-MPK3 OXI1-MKK4-MPK3 and OXI1-MPK3 of MAP cascades induction of WRKY22 by MEKK1-MKK4-MPK3 cascade induction of WRKY25 and repression of ZAT7 by Zat12 were suggested. RbohD and RbohF genes were up-regulated preferentially in NADPH oxidase genes which produce ROS.Conclusions: Our large-scale transcriptome analysis demonstrated that the space environment induced oxidative stress and ROS gene network was activated in the space-grown Mizuna some of which were common genes up-regulated by abiotic and biotic stress and were preferentially up-regulated genes by the space environment even though Mizuna grew in the space as well as on the ground showing that plants could acclimate to the space environment by reprograming the expression of ROS gene network.
se of archival resources has been limited to date by inconsistent methods for genomic profiling of degraded RNA from formalin-fixed paraffin-embedded (FFPE) samples. RNA-seq offers a novel way to address this problem. In this study we evaluated transcriptomic dose responses using RNA-seq in paired FFPE and frozen (FROZ) samples from two archival studies in mice, one recent (20 years old). Experimental treatments included di(2-ethylhexyl)phthalate (DEHP) and dichloroacetic acid (DCA) for the 20 year-old studies, respectively. Total RNA was ribodepleted and sequenced using the Illumina HiSeq platform. In the recent study, FFPE samples showed high concordance in total reads (98% vs FROZ), fold-change values of differentially expressed genes (DEGs) (R2 = 0.99), highly enriched target pathways (90% overlap with FROZ), and benchmark dose estimates for preselected target genes (-2% overall vs FROZ). In contrast, RNA-seq data from older FFPE samples had lower total reads (70% vs FROZ) and poor concordance in global DEGs and pathways. Despite a 99% loss of counts, dose responses were still evident for target genes in FFPE samples and positively correlated with paired FROZ samples. These findings highlight potential variability in the quality of RNA-seq data from FFPE samples. More recent FFPE samples were highly similar to FROZ samples in sequencing quality metrics, DEG profiles, and dose-response parameters, while further methods development is needed for older or lower-quality FFPE samples. This work should help broaden the use of archival resources in both chemical safety and translational science. This dataset is associated with the following publication: Hester, S., V. Bhat, B. Chorley, G. Carswell, W. Jones, L. Wehmas, and C. Wood. Dose-Response Analysis of RNA-Seq Profiles in Archival Formalin-Fixed Paraffin-Embedded (FFPE) Samples.. TOXICOLOGICAL SCIENCES. Society of Toxicology, 154(2): 202-213, (2016).
Traveling to nearby extraterrestrial objects having a reduced gravity level (partial gravity) compared to Earth s gravity is becoming a realistic objective for space agencies. The use of plants as part of life support systems will require a better understanding of the interactions among plant growth responses including tropisms under partial gravity conditions. Here we present results from the Seedling Growth space experiments on the ISS to complement the previously released GLDS-251 dataset including seeds of Arabidopsis thaliana wildtype plants. Seeds were germinated and seedlings grew for six days under different gravity levels namely micro-g several intermediate partial-g levels and 1g and were subjected to irradiation with blue light for the last 48 hours. RNA was extracted was obtained for 20 wildtype samples for subsequent RNAseq analysis in GLDS-251 here we add 36 samples from similarly exposed PhyA and PhyB mutants.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets produced during the validation of CWL-based pipelines, designed for the analysis of data from RNA-Seq, ChIP-Seq and germline variant calling experiments. Specifically, the workflows were tested using publicly available High-throughput (HTS) data from published studies on Chronic Lymphocytic Leukemia (CLL) (accession numbers: E-MTAB-6962, GSE115772) and Genome in a Bottle (GIAB) project samples (accession numbers: SRR6794144, SRR22476789, SRR22476790, SRR22476791).
The supporting data include:
Differential transcript and gene expression results produced during the analysis with the CWL-based RNA-Seq pipeline
Bigwig and narrowPeak files, differential binding results, table of consensus peaks and read counts of EZH2 and H3K27me3, produced during the analysis with the CWL-based ChIP-Seq pipeline
VCF files containing the detected and filtered variants, along with the respective hap.py () results regarding comparisons against the GIAB golden standard truth sets for both CWL-based germline variant calling pipelines