4 datasets found
  1. NCBI Virus - v3g7-abyx - Archive Repository

    • healthdata.gov
    application/rdfxml +5
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). NCBI Virus - v3g7-abyx - Archive Repository [Dataset]. https://healthdata.gov/dataset/NCBI-Virus-v3g7-abyx-Archive-Repository/49gk-bnyy
    Explore at:
    csv, application/rdfxml, tsv, json, xml, application/rssxmlAvailable download formats
    Dataset updated
    Jul 16, 2025
    Description

    This dataset tracks the updates made on the dataset "NCBI Virus" as a repository for previous versions of the data and metadata.

  2. M

    COVID-19 Genome Sequence Dataset

    • catalog.midasnetwork.us
    • registry.opendata.aws
    Updated Jul 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MIDAS Coordination Center (2023). COVID-19 Genome Sequence Dataset [Dataset]. https://catalog.midasnetwork.us/collection/168
    Explore at:
    Dataset updated
    Jul 6, 2023
    Dataset authored and provided by
    MIDAS Coordination Center
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Variables measured
    disease, COVID-19, pathogen, Homo sapiens, host organism, infectious disease, sequence collection, Severe acute respiratory syndrome coronavirus 2
    Dataset funded by
    National Institute of General Medical Sciences
    Description

    A centralized sequence repository for all strains of novel corona virus (SARS-CoV-2) submitted to the National Center for Biotechnology Information (NCBI).

  3. d

    Selection pressure analysis of dengue virus complete genome and E gene...

    • datadryad.org
    • search.dataone.org
    • +2more
    zip
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zilwa Mumtaz; Saeeda Zia; Rashid Saif; Muhammad Farhan Ul Haque; Muhammad Zubair Yousaf (2024). Selection pressure analysis of dengue virus complete genome and E gene nucleotide sequences from Pakistan [Dataset]. http://doi.org/10.5061/dryad.m63xsj49z
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2024
    Dataset provided by
    Dryad
    Authors
    Zilwa Mumtaz; Saeeda Zia; Rashid Saif; Muhammad Farhan Ul Haque; Muhammad Zubair Yousaf
    Area covered
    Pakistan
    Description

    Selection pressure analysis of dengue virus complete genome and E gene nucleotide sequences from Pakistan

    https://doi.org/10.5061/dryad.cjsxksnff

    Dataset Summary:

    This dataset contains 43 E gene and 44 complete genome nucleotide sequences of the dengue virus, encompassing all four serotypes (DENV-1, DENV-2, DENV-3, and DENV-4) identified in Pakistan to date. Additionally, the dataset includes four reference sequences of the dengue virus and six sequences from regions outside Pakistan to provide a broader comparative perspective. All sequences were retrieved from the Virus Pathogen Resource (ViPR) database.

    Experimental Procedures:

    1. Data Collection and Sequence Alignment: Sequences were aligned using MUSCLE for initial processing and MEGA X for detailed phylogenetic analyses. This dual approach ensures robust sequence alignment critical for accurate downstream analysis.

    2. Phylogenetic Analysis: After alignmen...

  4. f

    Output of mgs-workflow 2.1.0, used for "Inferring the sensitivity of...

    • figshare.com
    txt
    Updated Feb 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Grimm (2025). Output of mgs-workflow 2.1.0, used for "Inferring the sensitivity of wastewater metagenomic sequencing for early detection of viruses: a statistical modelling study" [Dataset]. http://doi.org/10.6084/m9.figshare.28395104.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 17, 2025
    Dataset provided by
    figshare
    Authors
    Simon Grimm
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repo contains results of biocomputational analysis of four wastewater sequencing datasets, used in the paper "Inferring the sensitivity of wastewater metagenomic sequencing for early detection of viruses: a statistical modelling study".The bioprojects for the studies are:Brinch 2020: PRJEB13832, PRJEB34633Crits-Christoph 2021: PRJNA661613Rothman 2021: PRJNA729801Spurbeck 2023: PRJNA924011The computational pipeline used for analysis can be found here: https://github.com/naobservatory/mgs-workflow/tree/2·1·0Here are the methods for study selection and processing:We performed a literature search for studies which (i) generated large (>100M read pairs), untargeted shotgun W-MGS datasets from raw treatment plant influent (ii) used sample preparation methods well-suited for broad enrichment of viruses, and (iii) were performed in regions and time periods for which good public-health data were available.We selected three RNA-sequencing studies which fit all of these criteria: Crits-Christoph et al. 2021, Rothman et al. 2021, and Spurbeck et al. 2023. While we were unable to find any DNA-sequencing studies that fulfilled all three criteria, we were still interested in assessing the capability of DNA sequencing to detect human-infecting viruses. Therefore, we included the DNA sequencing study by Brinch et al. 2020, which fulfils criteria (i) and (iii).All four of these studies conducted composite sampling of municipal influent (the three RNA studies all used 24-hour composite samples, while Brinch used 12-hour composites) and sequenced processed samples with paired-end Illumina technology. The three RNA sequencing studies were conducted in the United States, sampling wastewater from California and Ohio between 2020 and 2022. Brinch sampled wastewater in Copenhagen, Denmark from 2015 to 2018.For these studies, we obtained sequencing reads from the European Nucleotide Archive and identified virus reads using Bowtie2 and Kraken2 with relative abundance of each virus calculated as the number of high-quality, non-duplicate reads assigned to that virus divided by the total number of sequencing reads (appendix 5 p 23).In addition to untargeted W-MGS data, Crits-Christoph and Rothman also sequenced samples that had undergone hybridization-capture enrichment with the Illumina Respiratory Virus Panel (RVP). Data from these samples underwent the same bioinformatic analysis as the untargeted samples from the same studies.Here is the supplement with additional details:FASTQ files for each included study were obtained from the Sequencing Read Archive and analyzed with a custom computational pipeline (see “Data Sharing”) as follows:Raw reads were screened for adapter contamination with Cutadapt, Trimmomatic, and FASTP. Additionally, FASTP was used to trim low-quality and low-complexity sequences. Cleaned reads underwent deduplication with Clumpify.Deduplicated reads were ribodepleted with BBDuk, using SILVA SSU and LSU sequence databases, version 138.1.Ribodepleted reads were then separately analyzed in a taxonomic profiling and a human-infecting virus identification pipeline. In the taxonomic pipeline, paired-end reads were merged with BBMerge, with reads that failed to merge being concatenated with an intervening “N” base. Sequences were then passed to Kraken2 for taxonomic assignment, using the Standard database (2022-12-01 build), then summarized with Bracken.The human-infecting virus pipeline included the following steps:Beforehand, a database of human-infecting viral genomes was generated by obtaining all human-infecting virus taxonomy identifiers from Virus-Host DB; expanding this list to include all descendant identifiers; downloading all viral genomes corresponding to these identifiers from Genbank; and filtering the resulting database to remove transgenic and contaminated sequences.Ribodepleted reads were aligned against this database with Bowtie2 to identify putative human-infecting virus reads. Each read is assigned an NCBI taxonomy ID (taxid) corresponding to the best alignment found by Bowtie2. Putative human-infecting virus reads were filtered by aligning them to reference genomes that include human, cow, pig, mouse and E. coli, as well as various genetic engineering vectors. Alignment was performed by Bowtie2 and BBMap in series.After filtering, read pairs were merged with BBMerge and taxonomically assigned with Kraken2 as above. Each read was either (1) assigned to a human-infecting virus taxon with Kraken, (2) assigned to a non-HV taxon with Kraken, or (3) not assigned to any taxon. All reads in category (2) were filtered out.Reads are assigned a HV status if (i) they are given an HV assignment by both Bowtie2 and Kraken2; or if (ii) a read is unassigned by Kraken but aligns to an HV taxon with Bowtie2 with an length-normalized alignment score above a specific user-defined threshold of 20 (i.e. alignmentScore/ln(readLength) >= 20).The number of reads assigned to each human-infecting virus taxon are calculated by summing all Bowtie2 assignments to that taxid and its taxonomic descendants, according to the NCBI taxonomy hierarchy. These read counts were then used to calculate RA(1%) estimates as described in Appendix 3. The taxids used to generate such estimates are documented in Table S5.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2025). NCBI Virus - v3g7-abyx - Archive Repository [Dataset]. https://healthdata.gov/dataset/NCBI-Virus-v3g7-abyx-Archive-Repository/49gk-bnyy
Organization logo

NCBI Virus - v3g7-abyx - Archive Repository

Explore at:
csv, application/rdfxml, tsv, json, xml, application/rssxmlAvailable download formats
Dataset updated
Jul 16, 2025
Description

This dataset tracks the updates made on the dataset "NCBI Virus" as a repository for previous versions of the data and metadata.

Search
Clear search
Close search
Google apps
Main menu