100+ datasets found

f
A Next-Generation Sequencing Data Analysis Pipeline for Detecting Unknown...
plos.figshare.com
pdf
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yu-Nong Gong; Guang-Wu Chen; Shu-Li Yang; Ching-Ju Lee; Shin-Ru Shih; Kuo-Chien Tsao (2023). A Next-Generation Sequencing Data Analysis Pipeline for Detecting Unknown Pathogens from Mixed Clinical Samples and Revealing Their Genetic Diversity [Dataset]. http://doi.org/10.1371/journal.pone.0151495
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0151495
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Yu-Nong Gong; Guang-Wu Chen; Shu-Li Yang; Ching-Ju Lee; Shin-Ru Shih; Kuo-Chien Tsao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Forty-two cytopathic effect (CPE)-positive isolates were collected from 2008 to 2012. All isolates could not be identified for known viral pathogens by routine diagnostic assays. They were pooled into 8 groups of 5–6 isolates to reduce the sequencing cost. Next-generation sequencing (NGS) was conducted for each group of mixed samples, and the proposed data analysis pipeline was used to identify viral pathogens in these mixed samples. Polymerase chain reaction (PCR) or enzyme-linked immunosorbent assay (ELISA) was individually conducted for each of these 42 isolates depending on the predicted viral types in each group. Two isolates remained unknown after these tests. Moreover, iteration mapping was implemented for each of these 2 isolates, and predicted human parechovirus (HPeV) in both. In summary, our NGS pipeline detected the following viruses among the 42 isolates: 29 human rhinoviruses (HRVs), 10 HPeVs, 1 human adenovirus (HAdV), 1 echovirus and 1 rotavirus. We then focused on the 10 identified Taiwanese HPeVs because of their reported clinical significance over HRVs. Their genomes were assembled and their genetic diversity was explored. One novel 6-bp deletion was found in one HPeV-1 virus. In terms of nucleotide heterogeneity, 64 genetic variants were detected from these HPeVs using the mapped NGS reads. Most importantly, a recombination event was found between our HPeV-3 and a known HPeV-4 strain in the database. Similar event was detected in the other HPeV-3 strains in the same clade of the phylogenetic tree. These findings demonstrated that the proposed NGS data analysis pipeline identified unknown viruses from the mixed clinical samples, revealed their genetic identity and variants, and characterized their genetic features in terms of viral evolution.
Preliminary NGS prediction and PCR or ELISA detection.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yu-Nong Gong; Guang-Wu Chen; Shu-Li Yang; Ching-Ju Lee; Shin-Ru Shih; Kuo-Chien Tsao (2023). Preliminary NGS prediction and PCR or ELISA detection. [Dataset]. http://doi.org/10.1371/journal.pone.0151495.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0151495.t002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Yu-Nong Gong; Guang-Wu Chen; Shu-Li Yang; Ching-Ju Lee; Shin-Ru Shih; Kuo-Chien Tsao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Preliminary NGS prediction and PCR or ELISA detection.
CusVarDB: A tool for building customized sample-specific variant protein...
zenodo.org
zip
Updated Oct 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandeep Kasaragod; Sandeep Kasaragod; Varshasnata Mohanty; Varshasnata Mohanty; Ankur Tyagi; Ankur Tyagi; Santosh Kumar Behera; Santosh Kumar Behera; Arun H. Patil; Arun H. Patil; Sneha M. Pinto; Sneha M. Pinto; Harsha Gowda; Harsha Gowda; Prashant Kumar Modi; Prashant Kumar Modi (2020). CusVarDB: A tool for building customized sample-specific variant protein database from Next-generation sequencing datasets [Dataset]. http://doi.org/10.5281/zenodo.3747108
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3747108
Dataset updated
Oct 23, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sandeep Kasaragod; Sandeep Kasaragod; Varshasnata Mohanty; Varshasnata Mohanty; Ankur Tyagi; Ankur Tyagi; Santosh Kumar Behera; Santosh Kumar Behera; Arun H. Patil; Arun H. Patil; Sneha M. Pinto; Sneha M. Pinto; Harsha Gowda; Harsha Gowda; Prashant Kumar Modi; Prashant Kumar Modi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CusVarDB is a windows based tool for creating a variant protein database from Next-generation sequencing datasets. The program supports variant calling for Genome, RNA-Seq and exome datasets.

This repository will provide the resultant variant peptides identified in our study and its corresponding information. The detailed information of the table is given below.

Supplementary Table 1. This table contains the resultant variant peptides along with its wild-type peptides from BT474, MDMAB157, MFM223, and HCC38 datasets. Along with mutant peptides, this section also provides additional information such as peptide-spectrum match (PSM), Protein accession, cross-correlation value from the search (Xcorr) and retention time (RT).

Supplementary Table 2.This table provides the complete details of the resultant peptides. Here the mutant and corresponding wild-type peptides are mentioned in different sheets. For a given mutant peptide its wild-type peptide and corresponding information can be mapped using the VLOOKUP function in Excel by keeping column A (Sl.No) as lookup parameter.

Bioinformatic pipeline from: Increasing confidence for discerning species...

datadryad.org

zip

Updated Jan 21, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

Matt Snyder (2021). Bioinformatic pipeline from: Increasing confidence for discerning species and population compositions from metabarcoding assays of environmental samples: case studies of fishes in the Laurentian Great Lakes and Wabash River [Dataset]. http://doi.org/10.5061/dryad.7m0cfxprx

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.7m0cfxprx

Dataset updated

Jan 21, 2021

Dataset provided by

Dryad

Authors

Matt Snyder

Time period covered

Aug 14, 2020

Area covered

Wabash River, The Great Lakes

Description

Pipeline overview

Demultiplexed raw reads returned from an Illumina HTS platform were trimmed with MetaTrim.py (see MetaTrim_README.md)
Trimmed reads were merged in the R package Dada2 following Dada2Workflow.R
The resulting sequence table was dmuxed into fastas by SeqTabToFasta.pl
FASTAs were subjected to a BLAST search against multiple custom databases with BlastCycle500.pl
BLAST results were summarized with SummarizeBlast.pl

Scripts and usage:

MetaTrim.py: See MetaTrim_README.md
Dada2Workflow.R: workflow for Dada2 R package
SeqTabToFasta.pl: Run in the directory with the sequence table returned from Dada2. Sequence table must be named SeqTab.txt. Creates a subdir called Dada2ASVs and places FASTA files for each sample in this dir. Sequence titles in these FASTAS have the format > <ASV #> | <# of reads>.
BlastCycle500.pl: Run in Dada2ASVs. Performs a BLAST search for each ASV in each FASTA against custom databases, returning the top 500 res...

f
Data from: CAPRG: Sequence Assembling Pipeline for Next Generation...
datasetcatalog.nlm.nih.gov
Updated Feb 3, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
George, Glover; Perkins, Edward J.; Pham, Don; Elasri, Mohamed O.; Rawat, Arun; Scanlan, Leona D.; Gust, Kurt A.; Vulpe, Chris (2012). CAPRG: Sequence Assembling Pipeline for Next Generation Sequencing of Non-Model Organisms [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001144468
Explore at:
Dataset updated
Feb 3, 2012
Authors
George, Glover; Perkins, Edward J.; Pham, Don; Elasri, Mohamed O.; Rawat, Arun; Scanlan, Leona D.; Gust, Kurt A.; Vulpe, Chris
Description
Our goal is to introduce and describe the utility of a new pipeline “Contigs Assembly Pipeline using Reference Genome” (CAPRG), which has been developed to assemble “long sequence reads” for non-model organisms by leveraging a reference genome of a closely related phylogenetic relative. To facilitate this effort, we utilized two avian transcriptomic datasets generated using ROCHE/454 technology as test cases for CAPRG assembly. We compared the results of CAPRG assembly using a reference genome with the results of existing methods that utilize de novo strategies such as VELVET, PAVE, and MIRA by employing parameter space comparisons (intra-assembling comparison). CAPRG performed as well or better than the existing assembly methods based on various benchmarks for “gene-hunting.” Further, CAPRG completed the assemblies in a fraction of the time required by the existing assembly algorithms. Additional advantages of CAPRG included reduced contig inflation resulting in lower computational resources for annotation, and functional identification for contigs that may be categorized as “unknowns” by de novo methods. In addition to providing evaluation of CAPRG performance, we observed that the different assembly (inter-assembly) results could be integrated to enhance the putative gene coverage for any transcriptomics study.
d
Data from: Template-specific optimization of NGS genotyping pipelines...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jul 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artemis Efstratiou; Arnaud Gaigher; Sven KÃ¼nzel; Ana Teles; Tobias L. Lenz (2025). Template-specific optimization of NGS genotyping pipelines reveals allele-specific variation in MHC gene expression [Dataset]. http://doi.org/10.5061/dryad.qfttdz0qb
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.qfttdz0qb
Dataset updated
Jul 26, 2025
Dataset provided by
Dryad Digital Repository
Authors
Artemis Efstratiou; Arnaud Gaigher; Sven KÃ¼nzel; Ana Teles; Tobias L. Lenz
Time period covered
Jan 1, 2024
Description
Using high-throughput sequencing for precise genotyping of multi-locus gene families, such as the Major Histocompatibility Complex (MHC), remains challenging, due to the complexity of the data and difficulties in distinguishing genuine from erroneous variants. Several dedicated genotyping pipelines for data from high-throughput sequencing, such as next-generation sequencing (NGS), have been developed to tackle the ensuing risk of artificially inflated diversity. Here, we thoroughly assess three such multi-locus genotyping pipelines for NGS data, the DOC method, AmpliSAS and ACACIA, using MHC class IIÎ² datasets of three-spined stickleback gDNA, cDNA, and â€œartificialâ€ plasmid samples with known allelic diversity. We show that genotyping of gDNA and plasmid samples at optimal pipeline parameters was highly accurate and reproducible across methods. However, for cDNA data, gDNA-optimal parameter configuration yielded decreased overall genotyping precision and consistency between pipelines. F..., , , # Template-specific optimization of NGS genotyping pipelines reveals allele-specific variation in MHC gene expression

Description of the data and file structure

This submission consists of two Excel files.

The file 'Data_MHC-I' includes information regarding the 10 three-spined stickleback families included in our MHC-I genotyping dataset, and is separated into three sheets:

(i) Families overview, with information regarding the number of offspring and individual IDs of the families (columns: family ID, and corresponding offspring IDs)

(ii) Family genotypes (columns: Family ID, Inferred Parental Genotype1, Inferred Parental Genotype2, Observed Offspring Genotypes, Number of Alleles Per Genotype, and Number of Offspring), and

(iii) Allele segregation by family, where a table is presented for each of the 10 families used to infer the genetic linkage between MHC-I loci of the three-spined stickleback.

The file 'Data_MHC-II' includes the genotypes of all samples included in our M...
f
Overview of the parameters investigated for the variant calling pipeline...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Sandmann; Aniek O. de Graaf; Bert A. van der Reijden; Joop H. Jansen; Martin Dugas (2023). Overview of the parameters investigated for the variant calling pipeline with GLM. [Dataset]. http://doi.org/10.1371/journal.pone.0171983.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0171983.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Sarah Sandmann; Aniek O. de Graaf; Bert A. van der Reijden; Joop H. Jansen; Martin Dugas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview of the parameters investigated for the variant calling pipeline with GLM.
f
MutAid: Sanger and NGS Based Integrated Pipeline for Mutation...
plos.figshare.com
txt
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ram Vinay Pandey; Stephan Pabinger; Albert Kriegner; Andreas Weinhäusel (2023). MutAid: Sanger and NGS Based Integrated Pipeline for Mutation Identification, Validation and Annotation in Human Molecular Genetics [Dataset]. http://doi.org/10.1371/journal.pone.0147697
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0147697
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Ram Vinay Pandey; Stephan Pabinger; Albert Kriegner; Andreas Weinhäusel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid.
h
Supporting data for the "Development of tailored NGS data analysis pipeline...
datahub.hku.hk
Updated Oct 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yao Lei (2022). Supporting data for the "Development of tailored NGS data analysis pipeline for the diagnosis of Neuromuscular disorders" [Dataset]. http://doi.org/10.25442/hku.21184174.v1
Explore at:
Unique identifier
https://doi.org/10.25442/hku.21184174.v1
Dataset updated
Oct 7, 2022
Dataset provided by
HKU Data Repository
Authors
Yao Lei
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset is an excel file that summarises information of patients that found potential causal variant(s) or VUS(s) incompatible with the clinical diagnosis. It includes patients' gender, symptom onset age, age at last follow-up, clinical presentation, provisional clinical diagnosis, prior genetic test and results, availability of the WES and WGS data, and WES and WGS of their parents.

The first sheet is the patients that found potential causal variants. The last three columns are the identified potential causal variants, gene of the variants, inheritance model, ACMG guideline classification of the variants.

The second sheet is the patients found VUS(s) incompatible with the clinical diagnosis. The last three columns are the identified VUS(s) incompatible with the clinical diagnosis, gene of the VUS(s), ACMG guideline classification of the VUS(s).
s
Development of containerized pipelines for the reproducible analysis of...
scholardata.sun.ac.za
zip
Updated Feb 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RUVARASHE JOYLYNE Madzime (2025). Development of containerized pipelines for the reproducible analysis of amplicon sequence-, shotgun metagenomic- and metatranscriptomic data [Dataset]. http://doi.org/10.25413/sun.25383232.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25413/sun.25383232.v1
Dataset updated
Feb 7, 2025
Dataset provided by
SUNScholarData
Authors
RUVARASHE JOYLYNE Madzime
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Development of 3 independent containerized pipelines to analyse shotgun metagenomic-, amplicon sequencing- and metatranscriptomic data. The pipelines are meant to improve reproducibility in analysing these data. Containers were developed using Singularity for efficient use on HPC environments. The pipelines were developed using Nextflow. The pipelines were tested with their respective data on a local server Aither for the server environment and the Centre of High Performance Computing (CHPC) for the cluster environment.These files are table outputs from running the amplicon sequence pipeline on the cluster and server.
Matching mutant raw reads example.
plos.figshare.com
docx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kinga M. Bujakowska; Joseph White; Emily Place; Mark Consugar; Jason Comander (2023). Matching mutant raw reads example. [Dataset]. http://doi.org/10.1371/journal.pone.0142614.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0142614.s001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Kinga M. Bujakowska; Joseph White; Emily Place; Mark Consugar; Jason Comander
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Using the command “zgrep GAAAAAAGGAGGCCGGGCGCGGT D00379_000148_GCCAAT_L001_R2_001.fastq.gz”, 23 reads were obtained. The reads were aligned manually for display purposes and the sequence matching the probe was underlined. A space was added before the canonical 5’ end of the Alu insertion (GGCCGGG…). The read length of 121 bp was too short to span the entire Alu insertion (even if each read was computationally merged with its mate pair, not shown). (DOCX)

Files for publication "Microseek: A Protein-Based Metagenomic Pipeline for...

zenodo.org

Updated Jul 29, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Thomas Bigot; Thomas Bigot; Philippe Pérot; Sarah Temmam; Sarah Temmam; Béatrice Regnault; Béatrice Regnault; Marc Eloit; Marc Eloit; Philippe Pérot (2022). Files for publication "Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery" [Dataset]. http://doi.org/10.5281/zenodo.4475261

Explore at:

xzAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.4475261

Dataset updated

Jul 29, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Thomas Bigot; Thomas Bigot; Philippe Pérot; Sarah Temmam; Sarah Temmam; Béatrice Regnault; Béatrice Regnault; Marc Eloit; Marc Eloit; Philippe Pérot

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Context

These files correspond to the article “Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery” submitted to Genes.

File content

empty_matrices: 50M-read Tissues and Plasma matrices, no spike;
matrices_spiked_known_viruses: 50M-read Tissues and Plasma matrices spiked with six known virus at d1, d10, d100;
matrices_spiked_neo_viruses: 50M-read Tissues and Plasma matrices spiked with 3 Neopneumoviruses at d1 and d10;
neo_viruses: Nucleotide and protein sequences of 3 Neopeumoviruses
output_microseek: Microseek outputs, raw results and results after background filtration

File listing

empty_matrices.tar.xz
├── plasma.fastq 
└── tissue.fastq 

matrices_spiked_known_viruses
├── d1 
│  ├── spiked_plasma.fastq
│  └── spiked_tissue.fastq
├── d10 
│  ├── spiked_plasma.fastq
│  └── spiked_tissue.fastq
└── d100 
  ├── spiked_plasma.fastq
  └── spiked_tissue.fastq

matrices_spiked_neo_viruses.tar.xz
├── d1 
│  ├── plasma_spiked_with_neo1.fastq
│  ├── plasma_spiked_with_neo2.fastq
│  ├── plasma_spiked_with_neo3.fastq
│  ├── tissue_spiked_with_neo1.fastq
│  ├── tissue_spiked_with_neo2.fastq
│  └── tissue_spiked_with_neo3.fastq
└── d10 
  ├── plasma_spiked_with_neo1.fastq
  ├── plasma_spiked_with_neo2.fastq
  ├── plasma_spiked_with_neo3.fastq
  ├── tissue_spiked_with_neo1.fastq
  ├── tissue_spiked_with_neo2.fastq
  └── tissue_spiked_with_neo3.fastq

neo_viruses.tar.xz
├── genes 
│  ├── neo_1.fasta
│  ├── neo_2.fasta
│  └── neo_3.fasta
└── proteins 
  ├── neo_1.fasta
  ├── neo_2.fasta
  └── neo_3.fasta

output_microseek.tar.xz
├── empty_matrices
│  ├── matrix_plasma 
│  └── matrix_tissue 
├── matrices_spiked_known_viruses
│  ├── filtered
│  │  ├── d100_plasma 
│  │  ├── d100_tissue 
│  │  ├── d10_plasma 
│  │  ├── d10_tissue 
│  │  ├── d1_plasma 
│  │  └── d1_tissue 
│  └── non_filtered
│    ├── d100_plasma 
│    ├── d100_tissue 
│    ├── d10_plasma 
│    ├── d10_tissue 
│    ├── d1_plasma 
│    └── d1_tissue 
└── matrices_spiked_neo_viruses
  ├── filtered
  │  ├── plasma_spiked_with_neo1_at_d1 
  │  ├── plasma_spiked_with_neo1_at_d10 
  │  ├── plasma_spiked_with_neo2_at_d1 
  │  ├── plasma_spiked_with_neo2_at_d10 
  │  ├── plasma_spiked_with_neo3_at_d1 
  │  ├── plasma_spiked_with_neo3_at_d10 
  │  ├── tissue_spiked_with_neo1_at_d1 
  │  ├── tissue_spiked_with_neo1_at_d10 
  │  ├── tissue_spiked_with_neo2_at_d1 
  │  ├── tissue_spiked_with_neo2_at_d10 
  │  ├── tissue_spiked_with_neo3_at_d1 
  │  └── tissue_spiked_with_neo3_at_d10 
  └── non-filtered
    ├── plasma_spiked_with_neo1_at_d1 
    ├── plasma_spiked_with_neo1_at_d10 
    ├── plasma_spiked_with_neo2_at_d1 
    ├── plasma_spiked_with_neo2_at_d10 
    ├── plasma_spiked_with_neo3_at_d1 
    ├── plasma_spiked_with_neo3_at_d10 
    ├── tissue_spiked_with_neo1_at_d1 
    ├── tissue_spiked_with_neo1_at_d10 
    ├── tissue_spiked_with_neo2_at_d1 
    ├── tissue_spiked_with_neo2_at_d10 
    ├── tissue_spiked_with_neo3_at_d1 
    └── tissue_spiked_with_neo3_at_d10

r
Collection of datasets containing the TaxaSE bacterial taxonomic annotation...
researchdata.edu.au
Updated May 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffries Thomas; Hamonts Kelly; Ijaz Ali (2023). Collection of datasets containing the TaxaSE bacterial taxonomic annotation pipeline, SILVA insilico datasets and Illumina sequencing data from sugarcane bacterial (16S) including subhabitats from soil, rhizosphere, stem and root [Dataset]. https://researchdata.edu.au/collection-datasets-containing-stem-root/2368242
Explore at:
Dataset updated
May 16, 2023
Dataset provided by
Western Sydney University
Authors
Jeffries Thomas; Hamonts Kelly; Ijaz Ali
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 1, 2013 - Feb 28, 2017
Description
This dataset contains the TaxaSE bacterial taxonomic annotation pipeline (including its source code and associated data files). Insilico data generated from SILVA Release 123 database is also provided here, consisting of both whole SILVA and Removal of Taxa based validation approaches, which were used to compare Shannon entropy based sequence similarity approach to Percentage Identity (via USEARCH v7.0.1090 32bit, see Edgar 2010). Lastly, the raw FASTQ files as well as processed FASTA files from Sugarcane (Saccharum Spp.) are included, consisting of samples from soil, rhizosphere, root and stem sub-habitats, alongside results generated in QIIME 1.9.1 (Caporaso et.al 2010).
The quality of all Illumina R1 and R2 reads were assessed visually using FASTQC (Andrews 2016), merged using FLASH (Magoč & Salzberg 2011) and converted to FASTA format using QIIME’s “convert_fastaqual_fastq.py” script. Alpha diversity and beta diversity analysis were performed in QIIME, with TaxaSE results converted to QIIME compatible format for comparison. Insilico data was generated using MicroSim simulator from SILVA 123 Release database. Sugarcane leaf, stalk, root and rhizosphere soil samples were collected by Dr. Kelly Hamonts at Hawkesbury Institute for the Environment, Western Sydney University, Australia, in November 2014 from eight sugarcane fields growing three sugarcane varieties (KQ228, MQ239 and Q240) near Ingham, Queensland, Australia.
In each field, 3 stools were randomly selected and samples were collected from 2 plants per stool. Samples were snap-frozen in liquid nitrogen on the field, transported to the laboratory on dry ice and stored at -80C. Frozen sugarcane tissue samples were ground using mortar and pestle and DNA was extracted from the resulting powder using the MoBio PowerPlant DNA extraction kit, following the manufacturer’s instructions. The MoBIO PowerSoil DNA extraction kit was used to extract DNA from the soil samples. Bacterial 16S rRNA amplicon sequencing was performed by the NGS facility at Western Sydney University using Illumina Miseq (2x 301 bp PE) and the 341F/805R primer set.
f
Additional file 1 of Systematic and benchmarking studies of pipelines for...
datasetcatalog.nlm.nih.gov
springernature.figshare.com
Updated Apr 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sun, Lei; Yan, Qin; Zhang, Xin; Li, Qi-gang; Liu, Yong-feng; Yang, Wei; Lin, Qun-ting (2023). Additional file 1 of Systematic and benchmarking studies of pipelines for mammal WGBS data in the novel NGS platform [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001019533
Explore at:
Dataset updated
Apr 13, 2023
Authors
Sun, Lei; Yan, Qin; Zhang, Xin; Li, Qi-gang; Liu, Yong-feng; Yang, Wei; Lin, Qun-ting
Description
Additional file 1: Table S3. Property among GenoLab M, NextSeq X and NovaSeq 6000 platforms.

Global Chip Sequencing Market Research Report: By Application (Whole Genome...

wiseguyreports.com

Updated Jul 19, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

wWiseguy Research Consultants Pvt Ltd (2024). Global Chip Sequencing Market Research Report: By Application (Whole Genome Sequencing, Exome Sequencing, Single-Cell Sequencing, Metagenomics, RNA Sequencing), By Technology (Next-Generation Sequencing (NGS), Third-Generation Sequencing (TGS), Nanopore Sequencing, Single-Molecule Sequencing), By Sample Type (Blood, Saliva, Urine, Tissue, Tumor Biopsy), By End User (Hospitals & Clinics, Research Institutions, Pharmaceutical and Biotechnology Companies, Diagnostic Laboratories, Personalized Medicine Companies), By Data Analysis Pipeline (Data Preprocessing and Quality Control, Alignment and Variant Calling, Interpretation and Reporting, Cloud-Based Analysis, Open-Source Platforms) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/reports/chip-sequencing-market

Explore at:

Dataset updated

Jul 19, 2024

Dataset authored and provided by

wWiseguy Research Consultants Pvt Ltd

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Jan 7, 2024

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2024
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2023	19.37(USD Billion)
MARKET SIZE 2024	21.65(USD Billion)
MARKET SIZE 2032	52.8(USD Billion)
SEGMENTS COVERED	Application ,Technology ,Sample Type ,End User ,Data Analysis Pipeline ,Regional
COUNTRIES COVERED	North America, Europe, APAC, South America, MEA
KEY MARKET DYNAMICS	1 Technological advancements 2 Rising demand for personalized medicine 3 Growing prevalence of genetic diseases 4 Rapidly expanding healthcare IT sector 5 Increasing government funding for genetic research
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Oxford Nanopore Technologies ,PerkinElmer ,Macrogen ,Pacific Biosciences ,Illumina ,Complete Genomics ,10x Genomics ,Agilent Technologies ,Geneplus ,MGI Tech Co ,Novogene ,BioRad Laboratories ,Thermo Fisher Scientific ,BGI Group ,WuXi NextCODE
MARKET FORECAST PERIOD	2024 - 2032
KEY MARKET OPPORTUNITIES	1 Advancements in singlecell sequencing 2 Growing demand for precision medicine 3 Increased accessibility to nextgeneration sequencing 4 Technological advancements in chip design 5 Expansion into emerging markets
COMPOUND ANNUAL GROWTH RATE (CAGR)	11.79% (2024 - 2032)

NGS Data Analysis Services Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). NGS Data Analysis Services Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/ngs-data-analysis-services-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
NGS Data Analysis Services Market Outlook

According to our latest research, the global NGS Data Analysis Services market size was valued at USD 1.95 billion in 2024, reflecting robust expansion driven by the increasing adoption of next-generation sequencing (NGS) technologies across various sectors. The market is projected to achieve a CAGR of 17.8% from 2025 to 2033, reaching an estimated value of USD 7.24 billion by 2033. This impressive growth trajectory is underpinned by the rising demand for precision medicine, advancements in genomics research, and the growing need for sophisticated bioinformatics solutions.

The primary growth factor for the NGS Data Analysis Services market is the exponential increase in genomic data generated by NGS platforms, necessitating advanced data analysis solutions. As sequencing costs continue to decline and throughput increases, research institutions, healthcare providers, and pharmaceutical companies are generating vast amounts of complex sequencing data. This surge in data volume has created a significant demand for specialized NGS data analysis services that can efficiently process, interpret, and transform raw sequencing data into actionable insights. The complexity of NGS data, which requires expertise in bioinformatics, machine learning, and cloud computing, has further fueled the reliance on third-party service providers offering end-to-end data analysis solutions.

Another critical driver is the expanding application of NGS technologies in clinical diagnostics, drug discovery, and personalized medicine. Clinical laboratories and hospitals are increasingly leveraging NGS data analysis services to identify genetic mutations, detect rare diseases, and guide targeted therapies. The integration of NGS into routine clinical workflows has accelerated the need for accurate and rapid data analysis, ensuring timely and precise patient care. In the pharmaceutical sector, NGS data analysis services are instrumental in biomarker discovery, pharmacogenomics, and the development of novel therapeutics, further propelling market growth. Additionally, the adoption of NGS in agriculture and animal research for crop improvement and disease resistance studies is broadening the market’s application scope.

The advancement of bioinformatics tools and cloud-based data analysis platforms is also contributing significantly to the growth of the NGS Data Analysis Services market. Cloud computing has revolutionized the way NGS data is managed, stored, and analyzed by offering scalable, secure, and cost-effective solutions. Many service providers now offer cloud-based platforms that facilitate seamless data sharing, collaboration, and real-time analysis, enabling researchers and clinicians to derive rapid insights from sequencing projects. The integration of artificial intelligence and machine learning algorithms into bioinformatics pipelines is enhancing the accuracy, efficiency, and scalability of NGS data analysis, thereby attracting a broader customer base.

From a regional perspective, North America continues to dominate the NGS Data Analysis Services market, accounting for the largest share in 2024, followed by Europe and Asia Pacific. The presence of leading genomic research institutes, favorable government initiatives, and significant investments in precision medicine and biotechnology are key factors driving the North American market. Europe is witnessing substantial growth due to increasing funding for genomics research and the expansion of clinical NGS applications. Meanwhile, Asia Pacific is emerging as a high-growth region, fueled by rising healthcare expenditure, growing awareness of genomics, and the establishment of new sequencing facilities. The Middle East & Africa and Latin America, while smaller in market size, are also showing steady progress as NGS adoption spreads globally.

Service Type Analysis

The Service Type segment of the NGS Data Analysis Services market encompasses a broad range of offerings, including Data Preproc
Training material for the course "Exome analysis with GALAXY"
zenodo.org
explore.openaire.eu
bin, txt, vcf
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paolo Uva; Gianmauro Cuccuru; Paolo Uva; Gianmauro Cuccuru (2020). Training material for the course "Exome analysis with GALAXY" [Dataset]. http://doi.org/10.5281/zenodo.61377
Explore at:
bin, txt, vcfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.61377
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Paolo Uva; Gianmauro Cuccuru; Paolo Uva; Gianmauro Cuccuru
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Galaxy is an open source, web-based platform for data intensive biomedical research. It makes accessible bioinformatics applications to users lacking programming skills, enabling them to easily build analysis workflows for NGS data.

The course "Exome analysis using Galaxy" is aimed at PhD student, biologists, clinicians and researchers who are analysing, or need to analyse in the near future, high throughput exome sequencing data. The aim of the course is to make participants familiarise with the Galaxy platform and prepare them to work independently, using state-of-the art tools for the analysis of exome sequencing data.

The course will be delivered using a mixture of lectures and computer based hands-on practical sessions. Lectures will provide an up-to-date overview of the strategies for the analysis of exome next-generation experiments, starting from the raw sequence data. Analyses include sequence quality control, alignment to a reference genome, refinement of aligned sequences, variant calling, annotation and interpretation, and tools for visual inspection of results. Participants will apply the knowledge gained during the course to the analysis of Illumina’s real exome datasets, and implement workflows to reproduce the complete analysis. After the course, participants will be able to create pipeline for their individual analyses.

Those are the needed datasets for this course.
e
Replication data for: "An easy-to-use pipeline to analyze amplicon-based...
b2find.eudat.eu
dataverse.csuc.cat
Updated Nov 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Replication data for: "An easy-to-use pipeline to analyze amplicon-based Next Generation Sequencing results of human mitochondrial DNA from degraded samples" [Dataset]. https://b2find.eudat.eu/dataset/2fbe69c2-9db7-5bef-b978-59ac24a4cf75
Explore at:
Dataset updated
Nov 24, 2024
Description
The dataset contains the raw data, in FastQ format, of the sequences used to optimize the script "NCR-mtDNA_ampliconbasedngs", available in the GitHub repository named "DanielRCA/NCR-mtDNA_ampliconbasedngs". The dataset includes 163 samples (15 present-day samples and 148 ancient samples from before the 20th century). For each sample, there are two FastQ files, as the sequencing was performed in a paired-end format.
R
Next Generation Sequencing Market Market Research Report 2033
researchintelo.com
csv, pdf, pptx
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Intelo (2025). Next Generation Sequencing Market Market Research Report 2033 [Dataset]. https://researchintelo.com/report/next-generation-sequencing-market-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Jul 24, 2025
Dataset authored and provided by
Research Intelo
License
https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy
Time period covered
2024 - 2033
Area covered
Global
Description
Next Generation Sequencing (NGS) Market Outlook

According to the latest research, the global Next Generation Sequencing (NGS) market size reached USD 14.8 billion in 2024. The market is demonstrating robust expansion, driven by rapid technological advancements and increasing adoption across healthcare and research sectors. The market is expected to register a CAGR of 16.2% from 2025 to 2033, propelling the global market size to approximately USD 47.7 billion by 2033. This impressive growth trajectory is primarily fueled by the rising demand for precision medicine, increasing investments in genomic research, and the expanding application of NGS technologies in clinical diagnostics and drug discovery.

One of the most significant growth factors for the Next Generation Sequencing market is the increasing prevalence of chronic and genetic diseases worldwide. The ability of NGS to provide high-throughput, accurate, and cost-effective sequencing has revolutionized how researchers and clinicians approach the diagnosis and treatment of complex diseases. With the global burden of cancer, rare genetic disorders, and infectious diseases on the rise, healthcare providers are increasingly adopting NGS-based solutions to enable early detection and personalized treatment strategies. Additionally, the growing awareness among patients and practitioners about the benefits of genomics in healthcare is further accelerating the adoption of NGS technologies. The continuous decrease in sequencing costs, paired with improved accuracy and speed, has made NGS accessible to a broader range of healthcare institutions, fueling market expansion.

Another key driver of market growth is the surge in research and development activities, particularly in the fields of genomics, transcriptomics, and epigenomics. Academic institutions, research organizations, and pharmaceutical companies are heavily investing in NGS technologies to facilitate large-scale genomic studies, biomarker discovery, and novel drug development. The integration of NGS platforms into drug discovery pipelines allows for a deeper understanding of disease mechanisms, identification of therapeutic targets, and development of targeted therapies. The rapid evolution of NGS technologies, such as single-molecule real-time sequencing and nanopore sequencing, is further enhancing the capabilities of researchers to generate comprehensive genomic data, thus propelling market growth. The increasing number of collaborative projects and government initiatives supporting genomics research is also creating a favorable environment for market expansion.

A third major growth factor is the broadening application spectrum of NGS beyond human healthcare. The technology is increasingly being utilized in agriculture, animal research, and environmental studies. In agriculture, NGS is used for crop improvement, disease resistance breeding, and food safety testing, enabling the development of high-yield, resilient crop varieties. In animal research, NGS is facilitating the study of genetic traits, disease susceptibility, and evolutionary biology. The versatility of NGS platforms and their ability to generate high-quality data across diverse sample types are making them indispensable tools in various scientific domains. As industries recognize the potential of genomics to address critical challenges, the demand for NGS solutions continues to rise, contributing to the overall growth of the market.

From a regional perspective, North America currently dominates the Next Generation Sequencing market, accounting for the largest share in 2024. This leadership is attributed to the presence of advanced healthcare infrastructure, significant investments in genomics research, and a high concentration of major market players. The region's strong regulatory framework and supportive reimbursement policies are also facilitating the adoption of NGS technologies. Europe follows as the second-largest market, driven by increasing government funding for genomic medicine and the presence of leading research institutions. Meanwhile, the Asia Pacific region is witnessing the fastest growth, fueled by rising healthcare expenditures, expanding research capabilities, and growing awareness of precision medicine. Latin America and the Middle East & Africa are emerging markets, showing steady growth due to improving healthcare infrastructure and increasing investments in biotechnology. Overall, the global NGS market is poised for significant expansion across all major regions, supported by technological innovation and growing demand for genomic solutions.
<
i
Example of a guide map file for use in the TRITEX genome assembly pipeline
doi.ipk-gatersleben.de
Updated Nov 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Mascher; Martin Mascher (2020). Example of a guide map file for use in the TRITEX genome assembly pipeline [Dataset]. https://doi.ipk-gatersleben.de/DOI/c6f2608f-6e17-45b5-bf9b-4a6fd0cc94ef/039f8fdd-7077-4158-9efd-712e0b485526/2
Explore at:
Dataset updated
Nov 18, 2020
Dataset provided by
e!DAL - Plant Genomics and Phenomics Research Data Repository (PGP), IPK Gatersleben, Seeland OT Gatersleben, Corrensstraße 3, 06466, Germany
Authors
Martin Mascher; Martin Mascher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example of a guide map file for use in the TRITEX assembly pipeline [doi:10.1186/s13059-019-1899-5]. The guide map is provided in RDS format (serialized R object) for direct use in the TRITEX pipeline, and in a tabular text file in TSV format. This example uses the POPSEQ genetic map of the barley genome [doi:10.1111/tpj.12319].

Facebook

Twitter

Click to copy link

Link copied

Cite

Yu-Nong Gong; Guang-Wu Chen; Shu-Li Yang; Ching-Ju Lee; Shin-Ru Shih; Kuo-Chien Tsao (2023). A Next-Generation Sequencing Data Analysis Pipeline for Detecting Unknown Pathogens from Mixed Clinical Samples and Revealing Their Genetic Diversity [Dataset]. http://doi.org/10.1371/journal.pone.0151495

A Next-Generation Sequencing Data Analysis Pipeline for Detecting Unknown Pathogens from Mixed Clinical Samples and Revealing Their Genetic Diversity

Explore at:

12 scholarly articles cite this dataset (View in Google Scholar)

pdfAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0151495

Dataset updated

Jun 3, 2023

Dataset provided by

PLOS ONE

Authors

Yu-Nong Gong; Guang-Wu Chen; Shu-Li Yang; Ching-Ju Lee; Shin-Ru Shih; Kuo-Chien Tsao

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Forty-two cytopathic effect (CPE)-positive isolates were collected from 2008 to 2012. All isolates could not be identified for known viral pathogens by routine diagnostic assays. They were pooled into 8 groups of 5–6 isolates to reduce the sequencing cost. Next-generation sequencing (NGS) was conducted for each group of mixed samples, and the proposed data analysis pipeline was used to identify viral pathogens in these mixed samples. Polymerase chain reaction (PCR) or enzyme-linked immunosorbent assay (ELISA) was individually conducted for each of these 42 isolates depending on the predicted viral types in each group. Two isolates remained unknown after these tests. Moreover, iteration mapping was implemented for each of these 2 isolates, and predicted human parechovirus (HPeV) in both. In summary, our NGS pipeline detected the following viruses among the 42 isolates: 29 human rhinoviruses (HRVs), 10 HPeVs, 1 human adenovirus (HAdV), 1 echovirus and 1 rotavirus. We then focused on the 10 identified Taiwanese HPeVs because of their reported clinical significance over HRVs. Their genomes were assembled and their genetic diversity was explored. One novel 6-bp deletion was found in one HPeV-1 virus. In terms of nucleotide heterogeneity, 64 genetic variants were detected from these HPeVs using the mapped NGS reads. Most importantly, a recombination event was found between our HPeV-3 and a known HPeV-4 strain in the database. Similar event was detected in the other HPeV-3 strains in the same clade of the phylogenetic tree. These findings demonstrated that the proposed NGS data analysis pipeline identified unknown viruses from the mixed clinical samples, revealed their genetic identity and variants, and characterized their genetic features in terms of viral evolution.

Clear search

Close search

Google apps

Main menu

A Next-Generation Sequencing Data Analysis Pipeline for Detecting Unknown...

Preliminary NGS prediction and PCR or ELISA detection.

CusVarDB: A tool for building customized sample-specific variant protein...

Bioinformatic pipeline from: Increasing confidence for discerning species...

Data from: CAPRG: Sequence Assembling Pipeline for Next Generation...

Data from: Template-specific optimization of NGS genotyping pipelines...

Description of the data and file structure

Overview of the parameters investigated for the variant calling pipeline...

MutAid: Sanger and NGS Based Integrated Pipeline for Mutation...

Supporting data for the "Development of tailored NGS data analysis pipeline...

Development of containerized pipelines for the reproducible analysis of...

Matching mutant raw reads example.

Files for publication "Microseek: A Protein-Based Metagenomic Pipeline for...

Collection of datasets containing the TaxaSE bacterial taxonomic annotation...

Additional file 1 of Systematic and benchmarking studies of pipelines for...

Global Chip Sequencing Market Research Report: By Application (Whole Genome...

NGS Data Analysis Services Market Research Report 2033

NGS Data Analysis Services Market Outlook

Service Type Analysis

Training material for the course "Exome analysis with GALAXY"

Replication data for: "An easy-to-use pipeline to analyze amplicon-based...

Next Generation Sequencing Market Market Research Report 2033

Next Generation Sequencing (NGS) Market Outlook

Example of a guide map file for use in the TRITEX genome assembly pipeline

A Next-Generation Sequencing Data Analysis Pipeline for Detecting Unknown Pathogens from Mixed Clinical Samples and Revealing Their Genetic Diversity