100+ datasets found
  1. d

    NCBI Virus

    • catalog.data.gov
    • datadiscovery.nlm.nih.gov
    • +2more
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). NCBI Virus [Dataset]. https://catalog.data.gov/dataset/ncbi-virus
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset provided by
    National Library of Medicine
    Description

    NCBI Virus is an integrative, value-added resource designed to support retrieval, display and analysis of a curated collection of virus sequences and large sequence datasets. Its goal is to increase the usability of viral sequence data archived in GenBank and other NCBI repositories. This resource includes resources previously included in HIV-1, Human Protein Interaction Database, Influenza Virus Resource, and Virus Variation.

  2. f

    Viral genomes from GenBank (reference) - Comparative analysis of gene...

    • figshare.com
    application/x-gzip
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enrique Gonzalez Tortuero; Revathy Krishnamurthi; Heather Allison; Ian Goodhead; Chloë James (2023). Viral genomes from GenBank (reference) - Comparative analysis of gene prediction tools for viral genome annotation [Dataset]. http://doi.org/10.6084/m9.figshare.21353829.v1
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    figshare
    Authors
    Enrique Gonzalez Tortuero; Revathy Krishnamurthi; Heather Allison; Ian Goodhead; Chloë James
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The file "viral.genomic.gbk.tar.gz" contains all the RefSeq viral database information in GenBank format, used as the gold standard for the comparisons. In such a way, it should be run as is when using the script "genecounter.py" to count the number of genes, while it is the second (mandatory) input file for the counting of true positives (TP), false positives (FP) and false negatives (FN) via "coordinateschecker.py". In any case, it could also be used for other evaluation purposes.

  3. NCBI Virus BLAST Database

    • zenodo.org
    bin
    Updated Oct 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geoffrey Zahn; Geoffrey Zahn (2022). NCBI Virus BLAST Database [Dataset]. http://doi.org/10.5281/zenodo.7250323
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 26, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Geoffrey Zahn; Geoffrey Zahn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Curated database of NCBI virus genomes, formatted for BLASTn

  4. Diamond NCBI Genbank Viral database for SOVAP

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Mar 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdonaser Poursalavati; Abdonaser Poursalavati (2023). Diamond NCBI Genbank Viral database for SOVAP [Dataset]. http://doi.org/10.5281/zenodo.7758200
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 22, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abdonaser Poursalavati; Abdonaser Poursalavati
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Diamond NCBI Genbank Viral database

    Database type: Diamond database

    Database format version: 3

    Label: 2023-03-18_18-40-17

    Sequences: 3,191,190

    Sum length: 824,564,244

    Assembly summary entries: 58,201

    --------------------------------------------------------

    SOVAP v.1.3: GitHub

    Soil Virome Analysis Pipeline

    Description

    The study of viral communities in complex environmental samples, such as soil, can provide valuable insights into the diversity and functions of viral communities in the ecosystem. However, processing and analyzing of virome data can be a challenging task that requires the integration of various computational tools and techniques.

    To address these challenges, we have developed SOVAP pipeline that utilizes a suite of state-of-the-art tools for processing, analysis, and annotation viromics and metagenomics data.

    It utilizes various tools such as Fastp and Centrifuge for preprocessing and contamination removal, geNomad, Diamond and Megan for identification and annotation of viral contigs which are assembled and clustered using Megahit and CD-HIT. Additionally, this pipeline provides an estimate of the abundance of viral contigs, allowing for a more comprehensive understanding of the virome within the sample. The integration of these tools offers a reliable and effective means of taxonomy classification and annotation of viral contigs, aiding researchers in gaining insight into the composition and function of the virome within the analyzed sample.

    By integrating the SOVAP pipeline with IMG/VR and geNomad, it is possible to identify a wider range of viruses, including those that were previously unknown.

    The batch-mode script allows for the processing of multiple datasets using the SOVAP pipeline. This feature is particularly useful for large-scale analyses, such as those involving multiple environmental samples or large sequencing datasets.

  5. S

    Virus protein-related documents from NCBI Reference Sequence Database and...

    • scidb.cn
    Updated Aug 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiao Yang; Ge Xing-Yi (2024). Virus protein-related documents from NCBI Reference Sequence Database and ICTV. [Dataset]. http://doi.org/10.57760/sciencedb.12215
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 25, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Xiao Yang; Ge Xing-Yi
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    Including virus protein sequence files and their corresponding annotation files, as well as the virus classification table of ICTV.

  6. q

    Plant virus database (PVirDB)

    • researchdatafinder.qut.edu.au
    • researchdata.edu.au
    Updated May 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr Marie-Emilie Gauthier (2022). Plant virus database (PVirDB) [Dataset]. https://researchdatafinder.qut.edu.au/display/n14699
    Explore at:
    Dataset updated
    May 30, 2022
    Dataset provided by
    Queensland University of Technology (QUT)
    Authors
    Dr Marie-Emilie Gauthier
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a custom-built blast database of higher plant viruses and viroids.

    A challenge associated with the bioinformatics analysis of sequencing data for diagnostic purposes is the dependency on sequence databases for taxonomic assignment of detection. Although public databases such as the GenBank database maintained at NCBI are the most up to date, the enormous nature of these databases limits their portability across different computing resources. Moreover, sequencing data submitted by users to these public databases may not be accurate, and annotations provided in the GenBank record, such as the taxonomy assignment, which is crucial for accurate diagnosis, may be inaccurate and/or out of data. Additionally, the descriptors of the sequences in the public databases are not harmonized and lack taxonomic information posing an additional challenge to validate sequence homology-based pathogen detections.

  7. f

    RefSeq virus protein structure prediction database

    • uvaauas.figshare.com
    zip
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    W.E.W. Schravesande; Adriaan Verhage; M.V. Cligge; Raoul Frijters; H.A. van den Burg (2025). RefSeq virus protein structure prediction database [Dataset]. http://doi.org/10.21942/uva.28417079.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    University of Amsterdam / Amsterdam University of Applied Sciences
    Authors
    W.E.W. Schravesande; Adriaan Verhage; M.V. Cligge; Raoul Frijters; H.A. van den Burg
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Custom Virus database A custom foldseek target database was created, including all protein sequences derived from plant-infecting viruses currently found in the NCBI RefSeq database. In total, 8,191 protein sequences were extracted and used as template for protein structure predictions. Colabfold v1.5.2 (using localcolabfold), which is based upon AlphaFold v2.3.1(40), was used for protein model prediction. Setting: --random-seed 101 --num-seeds 3 --use-dropout --num-models 1 --num-recycle 8 --recycle-early-stop-tolerance 0.5No templates were used during the protein model prediction. The uniref30_2302 and colabfold_envdb_202108 databases were used to generate the multiple sequence alignments (https://colabfold.mmseqs.com/)The predicted structures were filtered based on the pLDDT value, resulting in a set of 7545 protein structures with a pLDDT ≥ 50.## Filesmodelling_stats.txt < Tab seperated file containing the modelling statistics for each structure predictionpdb_files/all < folder containing all pdb files resulting from the structure predictionpdb_files/pLDDT50 < folder containing all pdb files resulting from the structure prediction having a pLDDT score of 50 or higherVIRAL_PROTEIN_PLANT_REFSEQ.fasta < fasta file contain all protein sequences extracted from plant infecting viral genomes uploaded in the NCBI RefSeq database

  8. Datasets - Unveiling Host-Parasite Relationships through Conserved MITEs in...

    • zenodo.org
    Updated Aug 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ANA BELEN MARTIN CUADRADO; ANA BELEN MARTIN CUADRADO (2024). Datasets - Unveiling Host-Parasite Relationships through Conserved MITEs in Prokaryote and Viral Genomes [Dataset]. http://doi.org/10.5281/zenodo.12572003
    Explore at:
    Dataset updated
    Aug 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    ANA BELEN MARTIN CUADRADO; ANA BELEN MARTIN CUADRADO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Title:

    Unveiling Host-Parasite Relationships through Conserved MITEs in Prokaryote and Viral Genomes

    Authors:

    Francisco Nadal-Molero(1), Riccardo Roselli(1), Silvia Garcia-Juan(1), Alicia Campos-Lopez(1), Ana-Belen Martin-Cuadrado(1*)

    SUPPLEMENTARY FILES

    Supplementary File S1. Sequences of cMITEs detected in Bacteria genomes (fasta format). The hosting microbial species and inferred NCBI-taxonomy are indicated in the name of each sequence. The structure of the MITE name is: “Accession|Genome|start|end|TSD|TIRlength|MITETracker_group|Lineage”.

    Supplementary File S2. Sequences of cMITEs detected in the Archaea genomes (fasta format). The hosting microbial species and inferred NCBI-taxonomy are indicated in the name of each sequence. The structure of the MITE name is: “Accession|Genome|start|end|TSD|TIRlength|MITETracker_group|Lineage”.

    Supplementary File S3. Sequences of vMITEs detected in the virus sequences from the NCBI and IMG/VR v.4.1 database (fasta format). Virus, microbial host (if known) and inferred NCBI-taxonomy is stated in the name of each sequence. The structure of the MITE name is:

    “Accession|Genome|start|end|TSD|TIRlength|MITETracker_group|Virus|Name|Host”.

    Supplementary File S4. Sequences of si-vMITEs detected in the virus sequences from the NCBI and IMG/VR v.4.1 database (fasta format). Virus, microbial host (if known) and inferred NCBI-taxonomy are stated in the name of each sequence. The structure of the MITE name is: “Accession|Genome|start|end|Ident.Method.by.DB|Host”.

    Supplementary Files S5. Cytoscape networks. (A) Figure 1A, (B) Figure 1B.

    Supplementary File S6. Sequences of cMITEs obtained from 5837 genomes of Neisseriales. The structure of the MITE name is:

    “Accession|NucleotideID|start|end|TSD|TIRlength|MITETracker_group|Genome|Lineage”.

    Supplementary File S7. Sequences of si-vMITEs obtained from 5837 genomes of Neisseriales. The structure of the MITE name is: “Accession|Genome|start|end|Host”.

    Supplementary File S8. Sequences of cMITEs obtained from 46051 genomes of Bacteroidota. The structure of the MITE name is:

    “Accession|NucleotideID|start|end|TSD|TIRlength|MITETracker_group|Genome|Lineage”.

    Supplementary File S9. Sequences of si-vMITEs obtained from 46051 genomes of Bacteroidota. The structure of the MITE name is: “Accession|Genome|start|end|Host”.

  9. NCBI Virus - v3g7-abyx - Archive Repository

    • healthdata.gov
    application/rdfxml +5
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). NCBI Virus - v3g7-abyx - Archive Repository [Dataset]. https://healthdata.gov/dataset/NCBI-Virus-v3g7-abyx-Archive-Repository/49gk-bnyy
    Explore at:
    csv, application/rdfxml, tsv, json, xml, application/rssxmlAvailable download formats
    Dataset updated
    Jul 16, 2025
    Description

    This dataset tracks the updates made on the dataset "NCBI Virus" as a repository for previous versions of the data and metadata.

  10. o

    COVID-19 Genome Sequence Dataset

    • registry.opendata.aws
    • catalog.midasnetwork.us
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (NLM) (2020). COVID-19 Genome Sequence Dataset [Dataset]. https://registry.opendata.aws/ncbi-covid-19/
    Explore at:
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    <a href="http://nlm.nih.gov/">National Library of Medicine (NLM)</a>
    Description

    This repository within the ACTIV TRACE initiative houses a comprehensive collection of datasets related to SARS-CoV-2. The processing of SARS-CoV-2 Sequence Read Archive (SRA) files has been optimized to identify genetic variations in viral samples. This information is then presented in the Variant Call Format (VCF). Each VCF file corresponds to the SRA parent-run's accession ID. Additionally, the data is available in the parquet format, making it easier to search and filter using the Amazon Athena Service. The SARS-CoV-2 Variant Calling Pipeline is designed to handle new data every six hours, with updates to the AWS ODP bucket occurring daily.

  11. f

    Data from: A global dataset of sequence, diversity and biosafety...

    • figshare.com
    txt
    Updated Jun 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ying Huang; Shunlong Wang; Hong Liu; Evans Atoni; Fei Wang; Wei Chen; Zhaolin Li; Sergio Rodriguez; Zhiming Yuan; Zhaoyan Ming; Han Xia (2023). A global dataset of sequence, diversity and biosafety recommendation of arbovirus and arthropod-specific virus [Dataset]. http://doi.org/10.6084/m9.figshare.22154573.v7
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 27, 2023
    Dataset provided by
    figshare
    Authors
    Ying Huang; Shunlong Wang; Hong Liu; Evans Atoni; Fei Wang; Wei Chen; Zhaolin Li; Sergio Rodriguez; Zhiming Yuan; Zhaoyan Ming; Han Xia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We built a comprehensive dataset of the arboviruses and arthropod-specific viruses by curating worldwide available data from Arbovirus Catalog, Section VIII-F of the Biosafety in Microbiological and Biomedical Laboratories 6th edition, Virus Metadata Resource of International Committee on Taxonomy of Viruses, and GenBank. This dataset includes a complete information on viral taxonomy, biological characteristics, vectors and vertebrate hosts, distribution, recommended biosafety levels, genome segment, and nucleotide/amino acid sequences, which will facilitate research by scientists/researchers of arboviruses and arthropod-specific viruses in viral vector/host prediction, disease outbreak risk warning, arbovirus/arthropod-specific interactions, phylogenetic and evolutionary relationships, and biosafety risk assessment.

    This global dataset of viral sequence, diversity, distribution, and biosafety recommendation for arbovirus and ASV contains a viral information file (.xlsx), a nucleic acid sequences file (.fna) and amino acid sequences file (.faa), as accessible from figshare26. The column details of the viral meta information file (.xlsx) are as follows (The “NAV” in the field indicates not available value):

    Taxonomy Information 1. Virus_Group: (customized field) viruses in the database are divided into two groups: arbovirus and ASV. The former has both vertebrate and arthropod hosts, the latter has only arthropod hosts. 2. Name: (source from GenBank) the virus name, each name represents a distinct virus. 3. Acronym: (source from BMBL) acronym of virus name. 4. NCBI_Taxonomy_ID: (source from GenBank) taxonomy identifier of virus from NCBI Taxonomy Database. 5. Isolate: (source from GenBank) Isolate of virus from NCBI GenBank. 6. Unified_Isolate_Number: (customized field) renumbering of the field Isolate. Each isolate of the same virus is numbered. 7. Species: (source from ICTV) species that the virus belongs to. Species of the viruses are normally different with their names. 8. Genus: (source from ICTV) genus that the virus belongs to. 9. Family: (source from ICTV) family that the virus belongs to.

    Genome Information 10. Segmented: (customized field) whether the genome of the virus is unsegmented (recorded as “no”) or segmented virus (recorded as “yes”). Virus with an unknown number of segments (recorded as “NAV”). 11. Number_of_Segments: (source from GenBank) the theoretical number of segments of the virus. 12. Molecule_Type: (source from GenBank) molecule types of the virus genome which are divided into ssRNA(+), ssRNA(-), ssRNA(+/-), dsRNA, RNA, ssDNA(+/-), dsDNA and etc.

    Sequence Information 13. Accession: (source from GenBank) NCBI GenBank Accession of the nucleotide sequence. 14. Locus: (source from GenBank) the locus name of the nucleotide sequence. 15. SRA_Accession: (source from GenBank) NCBI SRA Accession of the nucleotide sequence. 16. Submitters: (source from GenBank) submitters of the nucleotide sequence. 17. Sequence_Type: (source from GenBank) whether the nucleotide sequence is a reference sequence (recorded as “RefSeq”) or a non-reference sequence (recorded as “GenBank”). 18. BioSample: (source from GenBank) NCBI BioSample Accession of the nucleotide sequence. 19. GenBank_Title: (source from GenBank) the field “DEFINITION” of NCBI GenBank database of the sequence. 20. Genotype: (source from GenBank) genotype of the nucleotide sequence. 21. Segment: (source from GenBank) segment identifier of the nucleotide sequence. 22. Unified_Segment_Number: (customized field) renumbering of the field Segment. Each segment is assigned a new number from 1. Segment of the unsegmented virus is assigned as 1.

    Host Information 23. Host_Species: (customized field) the species of the dead-end host of the virus. 24. Host_Genus: (customized field) the genus of the dead-end host of the virus. 25. Host_Family: (customized field) the family of the dead-end host of the virus. 26. Host: (source from GenBank) the field from the NCBI GenBank database that represents dead-end host or vectors.

    Biosafety Information 27. Recommended_BSL: (customized field) recommended biosafety level of laboratory to research the virus (recorded as “2”, “3”, “4”, “NAV”). 28. BMBL_Recommended_BSL: (source from BMBL) BMBL recommended biosafety level of laboratory to research the virus (recorded as “2”, “2 with 3 practices”, “2b”, “3”, “3a”, “3b”, “4”, “NAV”). 29. Basis_of_Rating: (source from BMBL) risk assessment of the virus (recorded as “A1”, “A2”, “A3”, “A4”, “A7”, “IE”, “S”, “NAV”). 30. Antigenic_Group: (source from BMBL) the antigenic group of the virus. 31. Isolated: (customized field) whether the virus has been isolated (“Yes” or “No”).

    Source Information 32. Latitude_and_Longitude: (source from GenBank) longitude and latitude of the virus isolation source. 33. State_or_Province: (customized field) state or provincial administrative unit of the virus source. 34. Geo_Location: (source from GenBank) geographical position of the virus source. 35. Country_or_Region: (customized field) the country or region of the virus source. 36. Isolation_Source: (source from GenBank) the organism which the virus was collected from. 37. Collection_Date: (source from GenBank) the date that the virus was collected. 38. Submit_Date: (source from GenBank) the date that the virus was submitted. 39. Release_Date: (source from GenBank) the date that the virus was released or last modified.

    References 40. Publications: (customized field) the number of publications and literature covering the specific virus research. 41. Accession_URL: (customized field) the DOI leading directly to the GenBank source.

    The nucleotide sequences file and amino acid sequences file are standard FASTA files. Each sequence information consists of two lines, header and content. The header contains two types of information, locus and accession, split by '|'. Content is a specific nucleic acid or amino acid sequence. The detailed definitions of the fields in the header are as follows: 1. Locus: NCBI GenBank LOCUS ID of the nucleotide sequence. 2. Accession: NCBI GenBank Accession of the nucleotide sequence. Protein_ID: a protein sequence identification number (for amino acid sequences file).

  12. M

    NCBI Virus: Severe acute respiratory syndrome coronavirus 2 data hub

    • catalog.midasnetwork.us
    acc, csv, fasta, xml
    Updated Jul 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MIDAS Coordination Center (2023). NCBI Virus: Severe acute respiratory syndrome coronavirus 2 data hub [Dataset]. https://catalog.midasnetwork.us/collection/167
    Explore at:
    fasta, xml, csv, accAvailable download formats
    Dataset updated
    Jul 6, 2023
    Dataset authored and provided by
    MIDAS Coordination Center
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Variables measured
    disease, COVID-19, pathogen, Homo sapiens, host organism, infectious disease, sequence collection, Severe acute respiratory syndrome coronavirus 2
    Dataset funded by
    National Institute of General Medical Sciences
    Description

    A data hub for searching, retrieving, and analyzing SARS-CoV-2 GenBank data.

  13. r

    NCBI Genome

    • rrid.site
    • dknet.org
    • +1more
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). NCBI Genome [Dataset]. http://identifiers.org/RRID:SCR_002474/resolver?q=*&i=rrid
    Explore at:
    Dataset updated
    Jul 6, 2025
    Description

    Database that organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations in six major organism groups: Archaea, Bacteria, Eukaryotes, Viruses, Viroids, and Plasmids. Genomes of over 1,200 organisms can be found in this database, representing both completely sequenced organisms and those for which sequencing is in progress. Users can browse by organism, and view genome maps and protein clusters. Links to other prokaryotic and archaeal genome projects, as well as BLAST tools and access to the rest of the NCBI online resources are available.

  14. f

    List of NCBI accession numbers for viral and host sequences used in this...

    • figshare.com
    • plos.figshare.com
    csv
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    G. Eric Bastien; Rachel N. Cable; Cecelia Batterbee; A. J. Wing; Luis Zaman; Melissa B. Duhaime (2024). List of NCBI accession numbers for viral and host sequences used in this study. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011649.s013
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 30, 2024
    Dataset provided by
    PLOS Computational Biology
    Authors
    G. Eric Bastien; Rachel N. Cable; Cecelia Batterbee; A. J. Wing; Luis Zaman; Melissa B. Duhaime
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    List of NCBI accession numbers for viral and host sequences used in this study.

  15. r

    Data from: NCBI Taxonomy

    • rrid.site
    • dknet.org
    • +2more
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). NCBI Taxonomy [Dataset]. http://identifiers.org/RRID:SCR_003256
    Explore at:
    Dataset updated
    Jun 23, 2025
    Description

    Database for a curated classification and nomenclature that contains the names of all organisms that are represented in the public sequence databases with at least one nucleotide or protein sequence. Data provided encompasses archaea, bacteria, eukaryota, viroids and viruses. The NCBI taxonomy database is not a primary source for taxonomic or phylogenetic information. Furthermore, the database does not follow a single taxonomic treatise but rather attempts to incorporate phylogenetic and taxonomic knowledge from a variety of sources, including the published literature, web-based databases, and the advice of sequence submitters and outside taxonomy experts. Consequently, the NCBI taxonomy database is not a phylogenetic or taxonomic authority and should not be cited as such.

  16. d

    Influenza Virus Resource

    • dknet.org
    • neuinfo.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Influenza Virus Resource [Dataset]. http://identifiers.org/RRID:SCR_002984
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database of data obtained from the NIAID Influenza Genome Sequencing Project as well as from GenBank, combined with tools for flu sequence analysis and annotation. In addition, it provides links to other resources that contain flu sequences, publications and general information about flu viruses. Users can search the Flu database, build queries, retrieve sequences, and apply analysis tools. This includes selecting influenza sequences by virus, subtype, host, and other criteria, finding complete genome sets, aligning sequence and others in the database (up to 1000 sequences), viewing clustering and phylogenetic trees, BLAST searching a flu sequence against the database, and more.

  17. d

    NCBI Genome

    • dknet.org
    Updated Aug 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). NCBI Genome [Dataset]. http://identifiers.org/RRID:SCR_002474
    Explore at:
    Dataset updated
    Aug 1, 2024
    Description

    Database that organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations in six major organism groups: Archaea, Bacteria, Eukaryotes, Viruses, Viroids, and Plasmids. Genomes of over 1,200 organisms can be found in this database, representing both completely sequenced organisms and those for which sequencing is in progress. Users can browse by organism, and view genome maps and protein clusters. Links to other prokaryotic and archaeal genome projects, as well as BLAST tools and access to the rest of the NCBI online resources are available.

  18. n

    Data from: Genetic diversity and spread dynamics of SARS-CoV-2 variants...

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Desire Mtetwa (2024). Genetic diversity and spread dynamics of SARS-CoV-2 variants present in African populations [Dataset]. http://doi.org/10.5061/dryad.1c59zw42d
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2024
    Dataset provided by
    Chinhoyi University of Technology
    Authors
    Desire Mtetwa
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The dynamics of coronavirus disease-19 (COVID-19) have been extensively researched in many settings around the world, but little is known about these patterns in Africa. 7540 complete nucleotide genomes from 51 African nations were obtained and analysed from the National Center for Biotechnology Information (NCBI) and Global Initiative on Sharing Influenza Data (GISAID) databases to examine genetic diversity and spread dynamics of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) lineages circulating in Africa. Utilising a variety of clade and lineage nomenclature schemes, we looked at their diversity, and used maximum parsimony inference methods to recreate their evolutionary divergence and history. According to this study, only 465 of the 2610 Pango lineages found to have existed in the world circulated in Africa after three years of the COVID-19 pandemic outbreak, with five different lineages dominating at various points during the outbreak. We identified South Africa, Kenya, and Nigeria as key sources of viral transmissions between Sub-Saharan African nations. These findings provide insight into the viral strains that are circulating in Africa and their evolutionary patterns. Methods Dataset mining and workflow SARS-CoV-2 genome sequences collected from Africa were obtained from NCBI database and GISAID database on February 26, 2023. 24415 African sequences were retrieved from both databases so as to examine the number of lineages circulating within Africa. The two databases had only 8044 complete genome sequences combined from Africa, and these sequences excluding those with low coverage using NextClade were retrieved to determine spread dynamics. 5908 sequences from 23 African countries were available in the NCBI and 2137 sequences from 41 African countries from GISAID database. The sequences were aligned using the online version of the MAFFT multiple sequence alignment tool, with the Wuhan-Hu-1 (MN 908947.3) as the reference sequence, and sequences with more than 5.0% ambiguous letters were removed. Duplicates were removed using goalign dedup software and only high quality African complete sequences remained (n=7540). Phylogenetic reconstruction Using IQ-TREE multicore software version v1.6.12 and NextClade, phylogeny reconstruction on the dataset was performed numerous times. Lineage classification PANGOLin, a web application was used to classify sequences into their lineages. The objective was to determine the SARS-CoV-2 lineages that are circulating in Africa that are most important from an epidemiological perspective, as well as the lineage dynamics within and across the African continent, due to the fact that this naming system integrates genetic and geographic data concerning SARS-CoV-2 dynamics. Phylogeographic reconstruction VOC, (VOI) and VUM were designated based on the WHO framework as of 20 January 2022. We included one lineage, namely A.23.1 and labelled it as VOI for the purposes of this analysis. This lineage was included because it demonstrated the continued evolution of African lineages into potentially more transmissible variants. VOI, VOC, and VUM that emerged on the African continent were marked. These were A.23.1 (VOI), B.1.351 and B.1.1.529 (VOC), B.1.640, and B.1.525 (VUM). Genome sequences of these five lineages were extracted from NCBI database for phylogeographic reconstruction. A similar approach to that described above (including alignment using online MAFFT) was employed. Phylogeographic reconstruction for all variants circulating in Africa and all VOI, VOC, and VUM was conducted using PASTML.

  19. Viral reference data for PathoLive

    • zenodo.org
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon H. Tausch; Simon H. Tausch (2020). Viral reference data for PathoLive [Dataset]. http://doi.org/10.5281/zenodo.2536788
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Simon H. Tausch; Simon H. Tausch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Viral reference data for PathoLive including GI numbers and taxonomic information per sequence. Data taken from the viral part of the NCBI RefSeq downloaded on 2016-07-06.

  20. Z

    dudesdb_201709 - Fungi and Virus - RefSeq - Complete Genomes

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piro, Vitor C. (2020). dudesdb_201709 - Fungi and Virus - RefSeq - Complete Genomes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1037287
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Piro, Vitor C.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    bowtie2 index and dudes database for the set of Fungal and Viral complete genomes from NCBI RefSeq, dating from 2017-09. The dudes database was made based on accession version numbers (DUDesDB.py option -m "av").

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
National Library of Medicine (2025). NCBI Virus [Dataset]. https://catalog.data.gov/dataset/ncbi-virus

NCBI Virus

Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
National Library of Medicine
Description

NCBI Virus is an integrative, value-added resource designed to support retrieval, display and analysis of a curated collection of virus sequences and large sequence datasets. Its goal is to increase the usability of viral sequence data archived in GenBank and other NCBI repositories. This resource includes resources previously included in HIV-1, Human Protein Interaction Database, Influenza Virus Resource, and Virus Variation.

Search
Clear search
Close search
Google apps
Main menu