100+ datasets found
  1. f

    A Next-Generation Sequencing Data Analysis Pipeline for Detecting Unknown...

    • plos.figshare.com
    pdf
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu-Nong Gong; Guang-Wu Chen; Shu-Li Yang; Ching-Ju Lee; Shin-Ru Shih; Kuo-Chien Tsao (2023). A Next-Generation Sequencing Data Analysis Pipeline for Detecting Unknown Pathogens from Mixed Clinical Samples and Revealing Their Genetic Diversity [Dataset]. http://doi.org/10.1371/journal.pone.0151495
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Yu-Nong Gong; Guang-Wu Chen; Shu-Li Yang; Ching-Ju Lee; Shin-Ru Shih; Kuo-Chien Tsao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Forty-two cytopathic effect (CPE)-positive isolates were collected from 2008 to 2012. All isolates could not be identified for known viral pathogens by routine diagnostic assays. They were pooled into 8 groups of 5–6 isolates to reduce the sequencing cost. Next-generation sequencing (NGS) was conducted for each group of mixed samples, and the proposed data analysis pipeline was used to identify viral pathogens in these mixed samples. Polymerase chain reaction (PCR) or enzyme-linked immunosorbent assay (ELISA) was individually conducted for each of these 42 isolates depending on the predicted viral types in each group. Two isolates remained unknown after these tests. Moreover, iteration mapping was implemented for each of these 2 isolates, and predicted human parechovirus (HPeV) in both. In summary, our NGS pipeline detected the following viruses among the 42 isolates: 29 human rhinoviruses (HRVs), 10 HPeVs, 1 human adenovirus (HAdV), 1 echovirus and 1 rotavirus. We then focused on the 10 identified Taiwanese HPeVs because of their reported clinical significance over HRVs. Their genomes were assembled and their genetic diversity was explored. One novel 6-bp deletion was found in one HPeV-1 virus. In terms of nucleotide heterogeneity, 64 genetic variants were detected from these HPeVs using the mapped NGS reads. Most importantly, a recombination event was found between our HPeV-3 and a known HPeV-4 strain in the database. Similar event was detected in the other HPeV-3 strains in the same clade of the phylogenetic tree. These findings demonstrated that the proposed NGS data analysis pipeline identified unknown viruses from the mixed clinical samples, revealed their genetic identity and variants, and characterized their genetic features in terms of viral evolution.

  2. Preliminary NGS prediction and PCR or ELISA detection.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu-Nong Gong; Guang-Wu Chen; Shu-Li Yang; Ching-Ju Lee; Shin-Ru Shih; Kuo-Chien Tsao (2023). Preliminary NGS prediction and PCR or ELISA detection. [Dataset]. http://doi.org/10.1371/journal.pone.0151495.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yu-Nong Gong; Guang-Wu Chen; Shu-Li Yang; Ching-Ju Lee; Shin-Ru Shih; Kuo-Chien Tsao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Preliminary NGS prediction and PCR or ELISA detection.

  3. CusVarDB: A tool for building customized sample-specific variant protein...

    • zenodo.org
    zip
    Updated Oct 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandeep Kasaragod; Sandeep Kasaragod; Varshasnata Mohanty; Varshasnata Mohanty; Ankur Tyagi; Ankur Tyagi; Santosh Kumar Behera; Santosh Kumar Behera; Arun H. Patil; Arun H. Patil; Sneha M. Pinto; Sneha M. Pinto; Harsha Gowda; Harsha Gowda; Prashant Kumar Modi; Prashant Kumar Modi (2020). CusVarDB: A tool for building customized sample-specific variant protein database from Next-generation sequencing datasets [Dataset]. http://doi.org/10.5281/zenodo.3747108
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 23, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sandeep Kasaragod; Sandeep Kasaragod; Varshasnata Mohanty; Varshasnata Mohanty; Ankur Tyagi; Ankur Tyagi; Santosh Kumar Behera; Santosh Kumar Behera; Arun H. Patil; Arun H. Patil; Sneha M. Pinto; Sneha M. Pinto; Harsha Gowda; Harsha Gowda; Prashant Kumar Modi; Prashant Kumar Modi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CusVarDB is a windows based tool for creating a variant protein database from Next-generation sequencing datasets. The program supports variant calling for Genome, RNA-Seq and exome datasets.

    This repository will provide the resultant variant peptides identified in our study and its corresponding information. The detailed information of the table is given below.

    Supplementary Table 1. This table contains the resultant variant peptides along with its wild-type peptides from BT474, MDMAB157, MFM223, and HCC38 datasets. Along with mutant peptides, this section also provides additional information such as peptide-spectrum match (PSM), Protein accession, cross-correlation value from the search (Xcorr) and retention time (RT).

    Supplementary Table 2.This table provides the complete details of the resultant peptides. Here the mutant and corresponding wild-type peptides are mentioned in different sheets. For a given mutant peptide its wild-type peptide and corresponding information can be mapped using the VLOOKUP function in Excel by keeping column A (Sl.No) as lookup parameter.

  4. d

    Bioinformatic pipeline from: Increasing confidence for discerning species...

    • datadryad.org
    zip
    Updated Jan 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matt Snyder (2021). Bioinformatic pipeline from: Increasing confidence for discerning species and population compositions from metabarcoding assays of environmental samples: case studies of fishes in the Laurentian Great Lakes and Wabash River [Dataset]. http://doi.org/10.5061/dryad.7m0cfxprx
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 21, 2021
    Dataset provided by
    Dryad
    Authors
    Matt Snyder
    Time period covered
    Aug 14, 2020
    Area covered
    Wabash River, The Great Lakes
    Description

    Pipeline overview

    Demultiplexed raw reads returned from an Illumina HTS platform were trimmed with MetaTrim.py (see MetaTrim_README.md)
    Trimmed reads were merged in the R package Dada2 following Dada2Workflow.R
    The resulting sequence table was dmuxed into fastas by SeqTabToFasta.pl
    FASTAs were subjected to a BLAST search against multiple custom databases with BlastCycle500.pl
    BLAST results were summarized with SummarizeBlast.pl
    

    Scripts and usage:

    MetaTrim.py: See MetaTrim_README.md
    Dada2Workflow.R: workflow for Dada2 R package
    SeqTabToFasta.pl: Run in the directory with the sequence table returned from Dada2. Sequence table must be named SeqTab.txt. Creates a subdir called Dada2ASVs and places FASTA files for each sample in this dir. Sequence titles in these FASTAS have the format > <ASV #> | <# of reads>.
    BlastCycle500.pl: Run in Dada2ASVs. Performs a BLAST search for each ASV in each FASTA against custom databases, returning the top 500 res...
    
  5. f

    Data from: CAPRG: Sequence Assembling Pipeline for Next Generation...

    • datasetcatalog.nlm.nih.gov
    Updated Feb 3, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George, Glover; Perkins, Edward J.; Pham, Don; Elasri, Mohamed O.; Rawat, Arun; Scanlan, Leona D.; Gust, Kurt A.; Vulpe, Chris (2012). CAPRG: Sequence Assembling Pipeline for Next Generation Sequencing of Non-Model Organisms [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001144468
    Explore at:
    Dataset updated
    Feb 3, 2012
    Authors
    George, Glover; Perkins, Edward J.; Pham, Don; Elasri, Mohamed O.; Rawat, Arun; Scanlan, Leona D.; Gust, Kurt A.; Vulpe, Chris
    Description

    Our goal is to introduce and describe the utility of a new pipeline “Contigs Assembly Pipeline using Reference Genome” (CAPRG), which has been developed to assemble “long sequence reads” for non-model organisms by leveraging a reference genome of a closely related phylogenetic relative. To facilitate this effort, we utilized two avian transcriptomic datasets generated using ROCHE/454 technology as test cases for CAPRG assembly. We compared the results of CAPRG assembly using a reference genome with the results of existing methods that utilize de novo strategies such as VELVET, PAVE, and MIRA by employing parameter space comparisons (intra-assembling comparison). CAPRG performed as well or better than the existing assembly methods based on various benchmarks for “gene-hunting.” Further, CAPRG completed the assemblies in a fraction of the time required by the existing assembly algorithms. Additional advantages of CAPRG included reduced contig inflation resulting in lower computational resources for annotation, and functional identification for contigs that may be categorized as “unknowns” by de novo methods. In addition to providing evaluation of CAPRG performance, we observed that the different assembly (inter-assembly) results could be integrated to enhance the putative gene coverage for any transcriptomics study.

  6. d

    Data from: Template-specific optimization of NGS genotyping pipelines...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jul 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artemis Efstratiou; Arnaud Gaigher; Sven Künzel; Ana Teles; Tobias L. Lenz (2025). Template-specific optimization of NGS genotyping pipelines reveals allele-specific variation in MHC gene expression [Dataset]. http://doi.org/10.5061/dryad.qfttdz0qb
    Explore at:
    Dataset updated
    Jul 26, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Artemis Efstratiou; Arnaud Gaigher; Sven Künzel; Ana Teles; Tobias L. Lenz
    Time period covered
    Jan 1, 2024
    Description

    Using high-throughput sequencing for precise genotyping of multi-locus gene families, such as the Major Histocompatibility Complex (MHC), remains challenging, due to the complexity of the data and difficulties in distinguishing genuine from erroneous variants. Several dedicated genotyping pipelines for data from high-throughput sequencing, such as next-generation sequencing (NGS), have been developed to tackle the ensuing risk of artificially inflated diversity. Here, we thoroughly assess three such multi-locus genotyping pipelines for NGS data, the DOC method, AmpliSAS and ACACIA, using MHC class IIβ datasets of three-spined stickleback gDNA, cDNA, and “artificial†plasmid samples with known allelic diversity. We show that genotyping of gDNA and plasmid samples at optimal pipeline parameters was highly accurate and reproducible across methods. However, for cDNA data, gDNA-optimal parameter configuration yielded decreased overall genotyping precision and consistency between pipelines. F..., , , # Template-specific optimization of NGS genotyping pipelines reveals allele-specific variation in MHC gene expression

    Description of the data and file structure

    This submission consists of two Excel files.

    The file 'Data_MHC-I' includes information regarding the 10 three-spined stickleback families included in our MHC-I genotyping dataset, and is separated into three sheets:

    (i) Families overview, with information regarding the number of offspring and individual IDs of the families (columns: family ID, and corresponding offspring IDs)

    (ii) Family genotypes (columns: Family ID, Inferred Parental Genotype1, Inferred Parental Genotype2, Observed Offspring Genotypes, Number of Alleles Per Genotype, and Number of Offspring), and

    (iii) Allele segregation by family, where a table is presented for each of the 10 families used to infer the genetic linkage between MHC-I loci of the three-spined stickleback.

    The file 'Data_MHC-II' includes the genotypes of all samples included in our M...

  7. f

    Overview of the parameters investigated for the variant calling pipeline...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Sandmann; Aniek O. de Graaf; Bert A. van der Reijden; Joop H. Jansen; Martin Dugas (2023). Overview of the parameters investigated for the variant calling pipeline with GLM. [Dataset]. http://doi.org/10.1371/journal.pone.0171983.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Sarah Sandmann; Aniek O. de Graaf; Bert A. van der Reijden; Joop H. Jansen; Martin Dugas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview of the parameters investigated for the variant calling pipeline with GLM.

  8. f

    MutAid: Sanger and NGS Based Integrated Pipeline for Mutation...

    • plos.figshare.com
    txt
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ram Vinay Pandey; Stephan Pabinger; Albert Kriegner; Andreas Weinhäusel (2023). MutAid: Sanger and NGS Based Integrated Pipeline for Mutation Identification, Validation and Annotation in Human Molecular Genetics [Dataset]. http://doi.org/10.1371/journal.pone.0147697
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ram Vinay Pandey; Stephan Pabinger; Albert Kriegner; Andreas Weinhäusel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid.

  9. h

    Supporting data for the "Development of tailored NGS data analysis pipeline...

    • datahub.hku.hk
    Updated Oct 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yao Lei (2022). Supporting data for the "Development of tailored NGS data analysis pipeline for the diagnosis of Neuromuscular disorders" [Dataset]. http://doi.org/10.25442/hku.21184174.v1
    Explore at:
    Dataset updated
    Oct 7, 2022
    Dataset provided by
    HKU Data Repository
    Authors
    Yao Lei
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset is an excel file that summarises information of patients that found potential causal variant(s) or VUS(s) incompatible with the clinical diagnosis. It includes patients' gender, symptom onset age, age at last follow-up, clinical presentation, provisional clinical diagnosis, prior genetic test and results, availability of the WES and WGS data, and WES and WGS of their parents.

    The first sheet is the patients that found potential causal variants. The last three columns are the identified potential causal variants, gene of the variants, inheritance model, ACMG guideline classification of the variants.

    The second sheet is the patients found VUS(s) incompatible with the clinical diagnosis. The last three columns are the identified VUS(s) incompatible with the clinical diagnosis, gene of the VUS(s), ACMG guideline classification of the VUS(s).

  10. s

    Development of containerized pipelines for the reproducible analysis of...

    • scholardata.sun.ac.za
    zip
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RUVARASHE JOYLYNE Madzime (2025). Development of containerized pipelines for the reproducible analysis of amplicon sequence-, shotgun metagenomic- and metatranscriptomic data [Dataset]. http://doi.org/10.25413/sun.25383232.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 7, 2025
    Dataset provided by
    SUNScholarData
    Authors
    RUVARASHE JOYLYNE Madzime
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Development of 3 independent containerized pipelines to analyse shotgun metagenomic-, amplicon sequencing- and metatranscriptomic data. The pipelines are meant to improve reproducibility in analysing these data. Containers were developed using Singularity for efficient use on HPC environments. The pipelines were developed using Nextflow. The pipelines were tested with their respective data on a local server Aither for the server environment and the Centre of High Performance Computing (CHPC) for the cluster environment.These files are table outputs from running the amplicon sequence pipeline on the cluster and server.

  11. Matching mutant raw reads example.

    • plos.figshare.com
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kinga M. Bujakowska; Joseph White; Emily Place; Mark Consugar; Jason Comander (2023). Matching mutant raw reads example. [Dataset]. http://doi.org/10.1371/journal.pone.0142614.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Kinga M. Bujakowska; Joseph White; Emily Place; Mark Consugar; Jason Comander
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Using the command “zgrep GAAAAAAGGAGGCCGGGCGCGGT D00379_000148_GCCAAT_L001_R2_001.fastq.gz”, 23 reads were obtained. The reads were aligned manually for display purposes and the sequence matching the probe was underlined. A space was added before the canonical 5’ end of the Alu insertion (GGCCGGG…). The read length of 121 bp was too short to span the entire Alu insertion (even if each read was computationally merged with its mate pair, not shown). (DOCX)

  12. Files for publication "Microseek: A Protein-Based Metagenomic Pipeline for...

    • zenodo.org
    xz
    Updated Jul 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Bigot; Thomas Bigot; Philippe Pérot; Sarah Temmam; Sarah Temmam; Béatrice Regnault; Béatrice Regnault; Marc Eloit; Marc Eloit; Philippe Pérot (2022). Files for publication "Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery" [Dataset]. http://doi.org/10.5281/zenodo.4475261
    Explore at:
    xzAvailable download formats
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Thomas Bigot; Thomas Bigot; Philippe Pérot; Sarah Temmam; Sarah Temmam; Béatrice Regnault; Béatrice Regnault; Marc Eloit; Marc Eloit; Philippe Pérot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    These files correspond to the article “Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery” submitted to Genes.

    File content

    • empty_matrices: 50M-read Tissues and Plasma matrices, no spike;
    • matrices_spiked_known_viruses: 50M-read Tissues and Plasma matrices spiked with six known virus at d1, d10, d100;
    • matrices_spiked_neo_viruses: 50M-read Tissues and Plasma matrices spiked with 3 Neopneumoviruses at d1 and d10;
    • neo_viruses: Nucleotide and protein sequences of 3 Neopeumoviruses
    • output_microseek: Microseek outputs, raw results and results after background filtration

    File listing

    empty_matrices.tar.xz
    ├── plasma.fastq 
    └── tissue.fastq 
    
    matrices_spiked_known_viruses
    ├── d1 
    │  ├── spiked_plasma.fastq
    │  └── spiked_tissue.fastq
    ├── d10 
    │  ├── spiked_plasma.fastq
    │  └── spiked_tissue.fastq
    └── d100 
      ├── spiked_plasma.fastq
      └── spiked_tissue.fastq
    
    matrices_spiked_neo_viruses.tar.xz
    ├── d1 
    │  ├── plasma_spiked_with_neo1.fastq
    │  ├── plasma_spiked_with_neo2.fastq
    │  ├── plasma_spiked_with_neo3.fastq
    │  ├── tissue_spiked_with_neo1.fastq
    │  ├── tissue_spiked_with_neo2.fastq
    │  └── tissue_spiked_with_neo3.fastq
    └── d10 
      ├── plasma_spiked_with_neo1.fastq
      ├── plasma_spiked_with_neo2.fastq
      ├── plasma_spiked_with_neo3.fastq
      ├── tissue_spiked_with_neo1.fastq
      ├── tissue_spiked_with_neo2.fastq
      └── tissue_spiked_with_neo3.fastq
    
    neo_viruses.tar.xz
    ├── genes 
    │  ├── neo_1.fasta
    │  ├── neo_2.fasta
    │  └── neo_3.fasta
    └── proteins 
      ├── neo_1.fasta
      ├── neo_2.fasta
      └── neo_3.fasta
    
    output_microseek.tar.xz
    ├── empty_matrices
    │  ├── matrix_plasma 
    │  └── matrix_tissue 
    ├── matrices_spiked_known_viruses
    │  ├── filtered
    │  │  ├── d100_plasma 
    │  │  ├── d100_tissue 
    │  │  ├── d10_plasma 
    │  │  ├── d10_tissue 
    │  │  ├── d1_plasma 
    │  │  └── d1_tissue 
    │  └── non_filtered
    │    ├── d100_plasma 
    │    ├── d100_tissue 
    │    ├── d10_plasma 
    │    ├── d10_tissue 
    │    ├── d1_plasma 
    │    └── d1_tissue 
    └── matrices_spiked_neo_viruses
      ├── filtered
      │  ├── plasma_spiked_with_neo1_at_d1 
      │  ├── plasma_spiked_with_neo1_at_d10 
      │  ├── plasma_spiked_with_neo2_at_d1 
      │  ├── plasma_spiked_with_neo2_at_d10 
      │  ├── plasma_spiked_with_neo3_at_d1 
      │  ├── plasma_spiked_with_neo3_at_d10 
      │  ├── tissue_spiked_with_neo1_at_d1 
      │  ├── tissue_spiked_with_neo1_at_d10 
      │  ├── tissue_spiked_with_neo2_at_d1 
      │  ├── tissue_spiked_with_neo2_at_d10 
      │  ├── tissue_spiked_with_neo3_at_d1 
      │  └── tissue_spiked_with_neo3_at_d10 
      └── non-filtered
        ├── plasma_spiked_with_neo1_at_d1 
        ├── plasma_spiked_with_neo1_at_d10 
        ├── plasma_spiked_with_neo2_at_d1 
        ├── plasma_spiked_with_neo2_at_d10 
        ├── plasma_spiked_with_neo3_at_d1 
        ├── plasma_spiked_with_neo3_at_d10 
        ├── tissue_spiked_with_neo1_at_d1 
        ├── tissue_spiked_with_neo1_at_d10 
        ├── tissue_spiked_with_neo2_at_d1 
        ├── tissue_spiked_with_neo2_at_d10 
        ├── tissue_spiked_with_neo3_at_d1 
        └── tissue_spiked_with_neo3_at_d10 
    
    

  13. r

    Collection of datasets containing the TaxaSE bacterial taxonomic annotation...

    • researchdata.edu.au
    Updated May 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffries Thomas; Hamonts Kelly; Ijaz Ali (2023). Collection of datasets containing the TaxaSE bacterial taxonomic annotation pipeline, SILVA insilico datasets and Illumina sequencing data from sugarcane bacterial (16S) including subhabitats from soil, rhizosphere, stem and root [Dataset]. https://researchdata.edu.au/collection-datasets-containing-stem-root/2368242
    Explore at:
    Dataset updated
    May 16, 2023
    Dataset provided by
    Western Sydney University
    Authors
    Jeffries Thomas; Hamonts Kelly; Ijaz Ali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Feb 1, 2013 - Feb 28, 2017
    Description

    This dataset contains the TaxaSE bacterial taxonomic annotation pipeline (including its source code and associated data files). Insilico data generated from SILVA Release 123 database is also provided here, consisting of both whole SILVA and Removal of Taxa based validation approaches, which were used to compare Shannon entropy based sequence similarity approach to Percentage Identity (via USEARCH v7.0.1090 32bit, see Edgar 2010). Lastly, the raw FASTQ files as well as processed FASTA files from Sugarcane (Saccharum Spp.) are included, consisting of samples from soil, rhizosphere, root and stem sub-habitats, alongside results generated in QIIME 1.9.1 (Caporaso et.al 2010).

    The quality of all Illumina R1 and R2 reads were assessed visually using FASTQC (Andrews 2016), merged using FLASH (Magoč & Salzberg 2011) and converted to FASTA format using QIIME’s “convert_fastaqual_fastq.py” script. Alpha diversity and beta diversity analysis were performed in QIIME, with TaxaSE results converted to QIIME compatible format for comparison. Insilico data was generated using MicroSim simulator from SILVA 123 Release database. Sugarcane leaf, stalk, root and rhizosphere soil samples were collected by Dr. Kelly Hamonts at Hawkesbury Institute for the Environment, Western Sydney University, Australia, in November 2014 from eight sugarcane fields growing three sugarcane varieties (KQ228, MQ239 and Q240) near Ingham, Queensland, Australia.

    In each field, 3 stools were randomly selected and samples were collected from 2 plants per stool. Samples were snap-frozen in liquid nitrogen on the field, transported to the laboratory on dry ice and stored at -80C. Frozen sugarcane tissue samples were ground using mortar and pestle and DNA was extracted from the resulting powder using the MoBio PowerPlant DNA extraction kit, following the manufacturer’s instructions. The MoBIO PowerSoil DNA extraction kit was used to extract DNA from the soil samples. Bacterial 16S rRNA amplicon sequencing was performed by the NGS facility at Western Sydney University using Illumina Miseq (2x 301 bp PE) and the 341F/805R primer set.

  14. f

    Additional file 1 of Systematic and benchmarking studies of pipelines for...

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Apr 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sun, Lei; Yan, Qin; Zhang, Xin; Li, Qi-gang; Liu, Yong-feng; Yang, Wei; Lin, Qun-ting (2023). Additional file 1 of Systematic and benchmarking studies of pipelines for mammal WGBS data in the novel NGS platform [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001019533
    Explore at:
    Dataset updated
    Apr 13, 2023
    Authors
    Sun, Lei; Yan, Qin; Zhang, Xin; Li, Qi-gang; Liu, Yong-feng; Yang, Wei; Lin, Qun-ting
    Description

    Additional file 1: Table S3. Property among GenoLab M, NextSeq X and NovaSeq 6000 platforms.

  15. w

    Global Chip Sequencing Market Research Report: By Application (Whole Genome...

    • wiseguyreports.com
    Updated Jul 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wWiseguy Research Consultants Pvt Ltd (2024). Global Chip Sequencing Market Research Report: By Application (Whole Genome Sequencing, Exome Sequencing, Single-Cell Sequencing, Metagenomics, RNA Sequencing), By Technology (Next-Generation Sequencing (NGS), Third-Generation Sequencing (TGS), Nanopore Sequencing, Single-Molecule Sequencing), By Sample Type (Blood, Saliva, Urine, Tissue, Tumor Biopsy), By End User (Hospitals & Clinics, Research Institutions, Pharmaceutical and Biotechnology Companies, Diagnostic Laboratories, Personalized Medicine Companies), By Data Analysis Pipeline (Data Preprocessing and Quality Control, Alignment and Variant Calling, Interpretation and Reporting, Cloud-Based Analysis, Open-Source Platforms) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/reports/chip-sequencing-market
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset authored and provided by
    wWiseguy Research Consultants Pvt Ltd
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Jan 7, 2024
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2024
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 202319.37(USD Billion)
    MARKET SIZE 202421.65(USD Billion)
    MARKET SIZE 203252.8(USD Billion)
    SEGMENTS COVEREDApplication ,Technology ,Sample Type ,End User ,Data Analysis Pipeline ,Regional
    COUNTRIES COVEREDNorth America, Europe, APAC, South America, MEA
    KEY MARKET DYNAMICS1 Technological advancements 2 Rising demand for personalized medicine 3 Growing prevalence of genetic diseases 4 Rapidly expanding healthcare IT sector 5 Increasing government funding for genetic research
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDOxford Nanopore Technologies ,PerkinElmer ,Macrogen ,Pacific Biosciences ,Illumina ,Complete Genomics ,10x Genomics ,Agilent Technologies ,Geneplus ,MGI Tech Co ,Novogene ,BioRad Laboratories ,Thermo Fisher Scientific ,BGI Group ,WuXi NextCODE
    MARKET FORECAST PERIOD2024 - 2032
    KEY MARKET OPPORTUNITIES1 Advancements in singlecell sequencing 2 Growing demand for precision medicine 3 Increased accessibility to nextgeneration sequencing 4 Technological advancements in chip design 5 Expansion into emerging markets
    COMPOUND ANNUAL GROWTH RATE (CAGR) 11.79% (2024 - 2032)
  16. NGS Data Analysis Services Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). NGS Data Analysis Services Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/ngs-data-analysis-services-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    NGS Data Analysis Services Market Outlook



    According to our latest research, the global NGS Data Analysis Services market size was valued at USD 1.95 billion in 2024, reflecting robust expansion driven by the increasing adoption of next-generation sequencing (NGS) technologies across various sectors. The market is projected to achieve a CAGR of 17.8% from 2025 to 2033, reaching an estimated value of USD 7.24 billion by 2033. This impressive growth trajectory is underpinned by the rising demand for precision medicine, advancements in genomics research, and the growing need for sophisticated bioinformatics solutions.




    The primary growth factor for the NGS Data Analysis Services market is the exponential increase in genomic data generated by NGS platforms, necessitating advanced data analysis solutions. As sequencing costs continue to decline and throughput increases, research institutions, healthcare providers, and pharmaceutical companies are generating vast amounts of complex sequencing data. This surge in data volume has created a significant demand for specialized NGS data analysis services that can efficiently process, interpret, and transform raw sequencing data into actionable insights. The complexity of NGS data, which requires expertise in bioinformatics, machine learning, and cloud computing, has further fueled the reliance on third-party service providers offering end-to-end data analysis solutions.




    Another critical driver is the expanding application of NGS technologies in clinical diagnostics, drug discovery, and personalized medicine. Clinical laboratories and hospitals are increasingly leveraging NGS data analysis services to identify genetic mutations, detect rare diseases, and guide targeted therapies. The integration of NGS into routine clinical workflows has accelerated the need for accurate and rapid data analysis, ensuring timely and precise patient care. In the pharmaceutical sector, NGS data analysis services are instrumental in biomarker discovery, pharmacogenomics, and the development of novel therapeutics, further propelling market growth. Additionally, the adoption of NGS in agriculture and animal research for crop improvement and disease resistance studies is broadening the market’s application scope.




    The advancement of bioinformatics tools and cloud-based data analysis platforms is also contributing significantly to the growth of the NGS Data Analysis Services market. Cloud computing has revolutionized the way NGS data is managed, stored, and analyzed by offering scalable, secure, and cost-effective solutions. Many service providers now offer cloud-based platforms that facilitate seamless data sharing, collaboration, and real-time analysis, enabling researchers and clinicians to derive rapid insights from sequencing projects. The integration of artificial intelligence and machine learning algorithms into bioinformatics pipelines is enhancing the accuracy, efficiency, and scalability of NGS data analysis, thereby attracting a broader customer base.




    From a regional perspective, North America continues to dominate the NGS Data Analysis Services market, accounting for the largest share in 2024, followed by Europe and Asia Pacific. The presence of leading genomic research institutes, favorable government initiatives, and significant investments in precision medicine and biotechnology are key factors driving the North American market. Europe is witnessing substantial growth due to increasing funding for genomics research and the expansion of clinical NGS applications. Meanwhile, Asia Pacific is emerging as a high-growth region, fueled by rising healthcare expenditure, growing awareness of genomics, and the establishment of new sequencing facilities. The Middle East & Africa and Latin America, while smaller in market size, are also showing steady progress as NGS adoption spreads globally.





    Service Type Analysis



    The Service Type segment of the NGS Data Analysis Services market encompasses a broad range of offerings, including Data Preproc

  17. Training material for the course "Exome analysis with GALAXY"

    • zenodo.org
    • explore.openaire.eu
    bin, txt, vcf
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paolo Uva; Gianmauro Cuccuru; Paolo Uva; Gianmauro Cuccuru (2020). Training material for the course "Exome analysis with GALAXY" [Dataset]. http://doi.org/10.5281/zenodo.61377
    Explore at:
    bin, txt, vcfAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Paolo Uva; Gianmauro Cuccuru; Paolo Uva; Gianmauro Cuccuru
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Galaxy is an open source, web-based platform for data intensive biomedical research. It makes accessible bioinformatics applications to users lacking programming skills, enabling them to easily build analysis workflows for NGS data.

    The course "Exome analysis using Galaxy" is aimed at PhD student, biologists, clinicians and researchers who are analysing, or need to analyse in the near future, high throughput exome sequencing data. The aim of the course is to make participants familiarise with the Galaxy platform and prepare them to work independently, using state-of-the art tools for the analysis of exome sequencing data.

    The course will be delivered using a mixture of lectures and computer based hands-on practical sessions. Lectures will provide an up-to-date overview of the strategies for the analysis of exome next-generation experiments, starting from the raw sequence data. Analyses include sequence quality control, alignment to a reference genome, refinement of aligned sequences, variant calling, annotation and interpretation, and tools for visual inspection of results. Participants will apply the knowledge gained during the course to the analysis of Illumina’s real exome datasets, and implement workflows to reproduce the complete analysis. After the course, participants will be able to create pipeline for their individual analyses.

    Those are the needed datasets for this course.

  18. e

    Replication data for: "An easy-to-use pipeline to analyze amplicon-based...

    • b2find.eudat.eu
    • dataverse.csuc.cat
    Updated Nov 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Replication data for: "An easy-to-use pipeline to analyze amplicon-based Next Generation Sequencing results of human mitochondrial DNA from degraded samples" [Dataset]. https://b2find.eudat.eu/dataset/2fbe69c2-9db7-5bef-b978-59ac24a4cf75
    Explore at:
    Dataset updated
    Nov 24, 2024
    Description

    The dataset contains the raw data, in FastQ format, of the sequences used to optimize the script "NCR-mtDNA_ampliconbasedngs", available in the GitHub repository named "DanielRCA/NCR-mtDNA_ampliconbasedngs". The dataset includes 163 samples (15 present-day samples and 148 ancient samples from before the 20th century). For each sample, there are two FastQ files, as the sequencing was performed in a paired-end format.

  19. R

    Next Generation Sequencing Market Market Research Report 2033

    • researchintelo.com
    csv, pdf, pptx
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Intelo (2025). Next Generation Sequencing Market Market Research Report 2033 [Dataset]. https://researchintelo.com/report/next-generation-sequencing-market-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Jul 24, 2025
    Dataset authored and provided by
    Research Intelo
    License

    https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy

    Time period covered
    2024 - 2033
    Area covered
    Global
    Description

    Next Generation Sequencing (NGS) Market Outlook



    According to the latest research, the global Next Generation Sequencing (NGS) market size reached USD 14.8 billion in 2024. The market is demonstrating robust expansion, driven by rapid technological advancements and increasing adoption across healthcare and research sectors. The market is expected to register a CAGR of 16.2% from 2025 to 2033, propelling the global market size to approximately USD 47.7 billion by 2033. This impressive growth trajectory is primarily fueled by the rising demand for precision medicine, increasing investments in genomic research, and the expanding application of NGS technologies in clinical diagnostics and drug discovery.



    One of the most significant growth factors for the Next Generation Sequencing market is the increasing prevalence of chronic and genetic diseases worldwide. The ability of NGS to provide high-throughput, accurate, and cost-effective sequencing has revolutionized how researchers and clinicians approach the diagnosis and treatment of complex diseases. With the global burden of cancer, rare genetic disorders, and infectious diseases on the rise, healthcare providers are increasingly adopting NGS-based solutions to enable early detection and personalized treatment strategies. Additionally, the growing awareness among patients and practitioners about the benefits of genomics in healthcare is further accelerating the adoption of NGS technologies. The continuous decrease in sequencing costs, paired with improved accuracy and speed, has made NGS accessible to a broader range of healthcare institutions, fueling market expansion.



    Another key driver of market growth is the surge in research and development activities, particularly in the fields of genomics, transcriptomics, and epigenomics. Academic institutions, research organizations, and pharmaceutical companies are heavily investing in NGS technologies to facilitate large-scale genomic studies, biomarker discovery, and novel drug development. The integration of NGS platforms into drug discovery pipelines allows for a deeper understanding of disease mechanisms, identification of therapeutic targets, and development of targeted therapies. The rapid evolution of NGS technologies, such as single-molecule real-time sequencing and nanopore sequencing, is further enhancing the capabilities of researchers to generate comprehensive genomic data, thus propelling market growth. The increasing number of collaborative projects and government initiatives supporting genomics research is also creating a favorable environment for market expansion.



    A third major growth factor is the broadening application spectrum of NGS beyond human healthcare. The technology is increasingly being utilized in agriculture, animal research, and environmental studies. In agriculture, NGS is used for crop improvement, disease resistance breeding, and food safety testing, enabling the development of high-yield, resilient crop varieties. In animal research, NGS is facilitating the study of genetic traits, disease susceptibility, and evolutionary biology. The versatility of NGS platforms and their ability to generate high-quality data across diverse sample types are making them indispensable tools in various scientific domains. As industries recognize the potential of genomics to address critical challenges, the demand for NGS solutions continues to rise, contributing to the overall growth of the market.



    From a regional perspective, North America currently dominates the Next Generation Sequencing market, accounting for the largest share in 2024. This leadership is attributed to the presence of advanced healthcare infrastructure, significant investments in genomics research, and a high concentration of major market players. The region's strong regulatory framework and supportive reimbursement policies are also facilitating the adoption of NGS technologies. Europe follows as the second-largest market, driven by increasing government funding for genomic medicine and the presence of leading research institutions. Meanwhile, the Asia Pacific region is witnessing the fastest growth, fueled by rising healthcare expenditures, expanding research capabilities, and growing awareness of precision medicine. Latin America and the Middle East & Africa are emerging markets, showing steady growth due to improving healthcare infrastructure and increasing investments in biotechnology. Overall, the global NGS market is poised for significant expansion across all major regions, supported by technological innovation and growing demand for genomic solutions.

    <

  20. i

    Example of a guide map file for use in the TRITEX genome assembly pipeline

    • doi.ipk-gatersleben.de
    Updated Nov 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Mascher; Martin Mascher (2020). Example of a guide map file for use in the TRITEX genome assembly pipeline [Dataset]. https://doi.ipk-gatersleben.de/DOI/c6f2608f-6e17-45b5-bf9b-4a6fd0cc94ef/039f8fdd-7077-4158-9efd-712e0b485526/2
    Explore at:
    Dataset updated
    Nov 18, 2020
    Dataset provided by
    e!DAL - Plant Genomics and Phenomics Research Data Repository (PGP), IPK Gatersleben, Seeland OT Gatersleben, Corrensstraße 3, 06466, Germany
    Authors
    Martin Mascher; Martin Mascher
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example of a guide map file for use in the TRITEX assembly pipeline [doi:10.1186/s13059-019-1899-5]. The guide map is provided in RDS format (serialized R object) for direct use in the TRITEX pipeline, and in a tabular text file in TSV format. This example uses the POPSEQ genetic map of the barley genome [doi:10.1111/tpj.12319].

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yu-Nong Gong; Guang-Wu Chen; Shu-Li Yang; Ching-Ju Lee; Shin-Ru Shih; Kuo-Chien Tsao (2023). A Next-Generation Sequencing Data Analysis Pipeline for Detecting Unknown Pathogens from Mixed Clinical Samples and Revealing Their Genetic Diversity [Dataset]. http://doi.org/10.1371/journal.pone.0151495

A Next-Generation Sequencing Data Analysis Pipeline for Detecting Unknown Pathogens from Mixed Clinical Samples and Revealing Their Genetic Diversity

Explore at:
12 scholarly articles cite this dataset (View in Google Scholar)
pdfAvailable download formats
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Yu-Nong Gong; Guang-Wu Chen; Shu-Li Yang; Ching-Ju Lee; Shin-Ru Shih; Kuo-Chien Tsao
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Forty-two cytopathic effect (CPE)-positive isolates were collected from 2008 to 2012. All isolates could not be identified for known viral pathogens by routine diagnostic assays. They were pooled into 8 groups of 5–6 isolates to reduce the sequencing cost. Next-generation sequencing (NGS) was conducted for each group of mixed samples, and the proposed data analysis pipeline was used to identify viral pathogens in these mixed samples. Polymerase chain reaction (PCR) or enzyme-linked immunosorbent assay (ELISA) was individually conducted for each of these 42 isolates depending on the predicted viral types in each group. Two isolates remained unknown after these tests. Moreover, iteration mapping was implemented for each of these 2 isolates, and predicted human parechovirus (HPeV) in both. In summary, our NGS pipeline detected the following viruses among the 42 isolates: 29 human rhinoviruses (HRVs), 10 HPeVs, 1 human adenovirus (HAdV), 1 echovirus and 1 rotavirus. We then focused on the 10 identified Taiwanese HPeVs because of their reported clinical significance over HRVs. Their genomes were assembled and their genetic diversity was explored. One novel 6-bp deletion was found in one HPeV-1 virus. In terms of nucleotide heterogeneity, 64 genetic variants were detected from these HPeVs using the mapped NGS reads. Most importantly, a recombination event was found between our HPeV-3 and a known HPeV-4 strain in the database. Similar event was detected in the other HPeV-3 strains in the same clade of the phylogenetic tree. These findings demonstrated that the proposed NGS data analysis pipeline identified unknown viruses from the mixed clinical samples, revealed their genetic identity and variants, and characterized their genetic features in terms of viral evolution.

Search
Clear search
Close search
Google apps
Main menu