100+ datasets found

d
NCBI Virus
catalog.data.gov
datadiscovery.nlm.nih.gov
+2more
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). NCBI Virus [Dataset]. https://catalog.data.gov/dataset/ncbi-virus
Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
National Library of Medicine
Description
NCBI Virus is an integrative, value-added resource designed to support retrieval, display and analysis of a curated collection of virus sequences and large sequence datasets. Its goal is to increase the usability of viral sequence data archived in GenBank and other NCBI repositories. This resource includes resources previously included in HIV-1, Human Protein Interaction Database, Influenza Virus Resource, and Virus Variation.
f
Viral genomes from GenBank (reference) - Comparative analysis of gene...
figshare.com
application/x-gzip
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Enrique Gonzalez Tortuero; Revathy Krishnamurthi; Heather Allison; Ian Goodhead; Chloë James (2023). Viral genomes from GenBank (reference) - Comparative analysis of gene prediction tools for viral genome annotation [Dataset]. http://doi.org/10.6084/m9.figshare.21353829.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21353829.v1
Dataset updated
Jun 3, 2023
Dataset provided by
figshare
Authors
Enrique Gonzalez Tortuero; Revathy Krishnamurthi; Heather Allison; Ian Goodhead; Chloë James
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The file "viral.genomic.gbk.tar.gz" contains all the RefSeq viral database information in GenBank format, used as the gold standard for the comparisons. In such a way, it should be run as is when using the script "genecounter.py" to count the number of genes, while it is the second (mandatory) input file for the counting of true positives (TP), false positives (FP) and false negatives (FN) via "coordinateschecker.py". In any case, it could also be used for other evaluation purposes.
NCBI Virus BLAST Database
zenodo.org
bin
Updated Oct 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geoffrey Zahn; Geoffrey Zahn (2022). NCBI Virus BLAST Database [Dataset]. http://doi.org/10.5281/zenodo.7250323
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7250323
Dataset updated
Oct 26, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Geoffrey Zahn; Geoffrey Zahn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Curated database of NCBI virus genomes, formatted for BLASTn
Diamond NCBI Genbank Viral database for SOVAP
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Mar 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdonaser Poursalavati; Abdonaser Poursalavati (2023). Diamond NCBI Genbank Viral database for SOVAP [Dataset]. http://doi.org/10.5281/zenodo.7758200
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7758200
Dataset updated
Mar 22, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Abdonaser Poursalavati; Abdonaser Poursalavati
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Diamond NCBI Genbank Viral database

Database type: Diamond database

Database format version: 3

Label: 2023-03-18_18-40-17

Sequences: 3,191,190

Sum length: 824,564,244

Assembly summary entries: 58,201

--------------------------------------------------------

SOVAP v.1.3: GitHub

Soil Virome Analysis Pipeline

Description

The study of viral communities in complex environmental samples, such as soil, can provide valuable insights into the diversity and functions of viral communities in the ecosystem. However, processing and analyzing of virome data can be a challenging task that requires the integration of various computational tools and techniques.

To address these challenges, we have developed SOVAP pipeline that utilizes a suite of state-of-the-art tools for processing, analysis, and annotation viromics and metagenomics data.

It utilizes various tools such as Fastp and Centrifuge for preprocessing and contamination removal, geNomad, Diamond and Megan for identification and annotation of viral contigs which are assembled and clustered using Megahit and CD-HIT. Additionally, this pipeline provides an estimate of the abundance of viral contigs, allowing for a more comprehensive understanding of the virome within the sample. The integration of these tools offers a reliable and effective means of taxonomy classification and annotation of viral contigs, aiding researchers in gaining insight into the composition and function of the virome within the analyzed sample.

By integrating the SOVAP pipeline with IMG/VR and geNomad, it is possible to identify a wider range of viruses, including those that were previously unknown.

The batch-mode script allows for the processing of multiple datasets using the SOVAP pipeline. This feature is particularly useful for large-scale analyses, such as those involving multiple environmental samples or large sequencing datasets.
S
Virus protein-related documents from NCBI Reference Sequence Database and...
scidb.cn
Updated Aug 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiao Yang; Ge Xing-Yi (2024). Virus protein-related documents from NCBI Reference Sequence Database and ICTV. [Dataset]. http://doi.org/10.57760/sciencedb.12215
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.12215
Dataset updated
Aug 25, 2024
Dataset provided by
Science Data Bank
Authors
Xiao Yang; Ge Xing-Yi
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
Including virus protein sequence files and their corresponding annotation files, as well as the virus classification table of ICTV.
q
Plant virus database (PVirDB)
researchdatafinder.qut.edu.au
researchdata.edu.au
Updated May 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Marie-Emilie Gauthier (2022). Plant virus database (PVirDB) [Dataset]. https://researchdatafinder.qut.edu.au/display/n14699
Explore at:
Dataset updated
May 30, 2022
Dataset provided by
Queensland University of Technology (QUT)
Authors
Dr Marie-Emilie Gauthier
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a custom-built blast database of higher plant viruses and viroids.

A challenge associated with the bioinformatics analysis of sequencing data for diagnostic purposes is the dependency on sequence databases for taxonomic assignment of detection. Although public databases such as the GenBank database maintained at NCBI are the most up to date, the enormous nature of these databases limits their portability across different computing resources. Moreover, sequencing data submitted by users to these public databases may not be accurate, and annotations provided in the GenBank record, such as the taxonomy assignment, which is crucial for accurate diagnosis, may be inaccurate and/or out of data. Additionally, the descriptors of the sequences in the public databases are not harmonized and lack taxonomic information posing an additional challenge to validate sequence homology-based pathogen detections.
f
RefSeq virus protein structure prediction database
uvaauas.figshare.com
zip
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
W.E.W. Schravesande; Adriaan Verhage; M.V. Cligge; Raoul Frijters; H.A. van den Burg (2025). RefSeq virus protein structure prediction database [Dataset]. http://doi.org/10.21942/uva.28417079.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.21942/uva.28417079.v1
Dataset updated
Mar 19, 2025
Dataset provided by
University of Amsterdam / Amsterdam University of Applied Sciences
Authors
W.E.W. Schravesande; Adriaan Verhage; M.V. Cligge; Raoul Frijters; H.A. van den Burg
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Custom Virus database A custom foldseek target database was created, including all protein sequences derived from plant-infecting viruses currently found in the NCBI RefSeq database. In total, 8,191 protein sequences were extracted and used as template for protein structure predictions. Colabfold v1.5.2 (using localcolabfold), which is based upon AlphaFold v2.3.1(40), was used for protein model prediction. Setting: --random-seed 101 --num-seeds 3 --use-dropout --num-models 1 --num-recycle 8 --recycle-early-stop-tolerance 0.5No templates were used during the protein model prediction. The uniref30_2302 and colabfold_envdb_202108 databases were used to generate the multiple sequence alignments (https://colabfold.mmseqs.com/)The predicted structures were filtered based on the pLDDT value, resulting in a set of 7545 protein structures with a pLDDT ≥ 50.## Filesmodelling_stats.txt < Tab seperated file containing the modelling statistics for each structure predictionpdb_files/all < folder containing all pdb files resulting from the structure predictionpdb_files/pLDDT50 < folder containing all pdb files resulting from the structure prediction having a pLDDT score of 50 or higherVIRAL_PROTEIN_PLANT_REFSEQ.fasta < fasta file contain all protein sequences extracted from plant infecting viral genomes uploaded in the NCBI RefSeq database
Datasets - Unveiling Host-Parasite Relationships through Conserved MITEs in...
zenodo.org
Updated Aug 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ANA BELEN MARTIN CUADRADO; ANA BELEN MARTIN CUADRADO (2024). Datasets - Unveiling Host-Parasite Relationships through Conserved MITEs in Prokaryote and Viral Genomes [Dataset]. http://doi.org/10.5281/zenodo.12572003
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.12572003
Dataset updated
Aug 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
ANA BELEN MARTIN CUADRADO; ANA BELEN MARTIN CUADRADO
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Title:

Unveiling Host-Parasite Relationships through Conserved MITEs in Prokaryote and Viral Genomes

Authors:

Francisco Nadal-Molero⁽¹⁾, Riccardo Roselli⁽¹⁾, Silvia Garcia-Juan⁽¹⁾, Alicia Campos-Lopez⁽¹⁾, Ana-Belen Martin-Cuadrado^(1*)

SUPPLEMENTARY FILES

Supplementary File S1. Sequences of cMITEs detected in Bacteria genomes (fasta format). The hosting microbial species and inferred NCBI-taxonomy are indicated in the name of each sequence. The structure of the MITE name is: “Accession|Genome|start|end|TSD|TIRlength|MITETracker_group|Lineage”.

Supplementary File S2. Sequences of cMITEs detected in the Archaea genomes (fasta format). The hosting microbial species and inferred NCBI-taxonomy are indicated in the name of each sequence. The structure of the MITE name is: “Accession|Genome|start|end|TSD|TIRlength|MITETracker_group|Lineage”.

Supplementary File S3. Sequences of vMITEs detected in the virus sequences from the NCBI and IMG/VR v.4.1 database (fasta format). Virus, microbial host (if known) and inferred NCBI-taxonomy is stated in the name of each sequence. The structure of the MITE name is:

“Accession|Genome|start|end|TSD|TIRlength|MITETracker_group|Virus|Name|Host”.

Supplementary File S4. Sequences of si-vMITEs detected in the virus sequences from the NCBI and IMG/VR v.4.1 database (fasta format). Virus, microbial host (if known) and inferred NCBI-taxonomy are stated in the name of each sequence. The structure of the MITE name is: “Accession|Genome|start|end|Ident.Method.by.DB|Host”.

Supplementary Files S5. Cytoscape networks. (A) Figure 1A, (B) Figure 1B.

Supplementary File S6. Sequences of cMITEs obtained from 5837 genomes of Neisseriales. The structure of the MITE name is:

“Accession|NucleotideID|start|end|TSD|TIRlength|MITETracker_group|Genome|Lineage”.

Supplementary File S7. Sequences of si-vMITEs obtained from 5837 genomes of Neisseriales. The structure of the MITE name is: “Accession|Genome|start|end|Host”.

Supplementary File S8. Sequences of cMITEs obtained from 46051 genomes of Bacteroidota. The structure of the MITE name is:

“Accession|NucleotideID|start|end|TSD|TIRlength|MITETracker_group|Genome|Lineage”.

Supplementary File S9. Sequences of si-vMITEs obtained from 46051 genomes of Bacteroidota. The structure of the MITE name is: “Accession|Genome|start|end|Host”.
NCBI Virus - v3g7-abyx - Archive Repository
healthdata.gov
application/rdfxml +5
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). NCBI Virus - v3g7-abyx - Archive Repository [Dataset]. https://healthdata.gov/dataset/NCBI-Virus-v3g7-abyx-Archive-Repository/49gk-bnyy
Explore at:
csv, application/rdfxml, tsv, json, xml, application/rssxmlAvailable download formats
Dataset updated
Jul 16, 2025
Description
This dataset tracks the updates made on the dataset "NCBI Virus" as a repository for previous versions of the data and metadata.
o
COVID-19 Genome Sequence Dataset
registry.opendata.aws
catalog.midasnetwork.us
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (NLM) (2020). COVID-19 Genome Sequence Dataset [Dataset]. https://registry.opendata.aws/ncbi-covid-19/
Explore at:
Dataset updated
Jul 9, 2020
Dataset provided by
<a href="http://nlm.nih.gov/">National Library of Medicine (NLM)</a>
Description
This repository within the ACTIV TRACE initiative houses a comprehensive collection of datasets related to SARS-CoV-2. The processing of SARS-CoV-2 Sequence Read Archive (SRA) files has been optimized to identify genetic variations in viral samples. This information is then presented in the Variant Call Format (VCF). Each VCF file corresponds to the SRA parent-run's accession ID. Additionally, the data is available in the parquet format, making it easier to search and filter using the Amazon Athena Service. The SARS-CoV-2 Variant Calling Pipeline is designed to handle new data every six hours, with updates to the AWS ODP bucket occurring daily.
f
Data from: A global dataset of sequence, diversity and biosafety...
figshare.com
txt
Updated Jun 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ying Huang; Shunlong Wang; Hong Liu; Evans Atoni; Fei Wang; Wei Chen; Zhaolin Li; Sergio Rodriguez; Zhiming Yuan; Zhaoyan Ming; Han Xia (2023). A global dataset of sequence, diversity and biosafety recommendation of arbovirus and arthropod-specific virus [Dataset]. http://doi.org/10.6084/m9.figshare.22154573.v7
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22154573.v7
Dataset updated
Jun 27, 2023
Dataset provided by
figshare
Authors
Ying Huang; Shunlong Wang; Hong Liu; Evans Atoni; Fei Wang; Wei Chen; Zhaolin Li; Sergio Rodriguez; Zhiming Yuan; Zhaoyan Ming; Han Xia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We built a comprehensive dataset of the arboviruses and arthropod-specific viruses by curating worldwide available data from Arbovirus Catalog, Section VIII-F of the Biosafety in Microbiological and Biomedical Laboratories 6th edition, Virus Metadata Resource of International Committee on Taxonomy of Viruses, and GenBank. This dataset includes a complete information on viral taxonomy, biological characteristics, vectors and vertebrate hosts, distribution, recommended biosafety levels, genome segment, and nucleotide/amino acid sequences, which will facilitate research by scientists/researchers of arboviruses and arthropod-specific viruses in viral vector/host prediction, disease outbreak risk warning, arbovirus/arthropod-specific interactions, phylogenetic and evolutionary relationships, and biosafety risk assessment.

This global dataset of viral sequence, diversity, distribution, and biosafety recommendation for arbovirus and ASV contains a viral information file (.xlsx), a nucleic acid sequences file (.fna) and amino acid sequences file (.faa), as accessible from figshare26. The column details of the viral meta information file (.xlsx) are as follows (The “NAV” in the field indicates not available value):

Taxonomy Information 1. Virus_Group: (customized field) viruses in the database are divided into two groups: arbovirus and ASV. The former has both vertebrate and arthropod hosts, the latter has only arthropod hosts. 2. Name: (source from GenBank) the virus name, each name represents a distinct virus. 3. Acronym: (source from BMBL) acronym of virus name. 4. NCBI_Taxonomy_ID: (source from GenBank) taxonomy identifier of virus from NCBI Taxonomy Database. 5. Isolate: (source from GenBank) Isolate of virus from NCBI GenBank. 6. Unified_Isolate_Number: (customized field) renumbering of the field Isolate. Each isolate of the same virus is numbered. 7. Species: (source from ICTV) species that the virus belongs to. Species of the viruses are normally different with their names. 8. Genus: (source from ICTV) genus that the virus belongs to. 9. Family: (source from ICTV) family that the virus belongs to.

Genome Information 10. Segmented: (customized field) whether the genome of the virus is unsegmented (recorded as “no”) or segmented virus (recorded as “yes”). Virus with an unknown number of segments (recorded as “NAV”). 11. Number_of_Segments: (source from GenBank) the theoretical number of segments of the virus. 12. Molecule_Type: (source from GenBank) molecule types of the virus genome which are divided into ssRNA(+), ssRNA(-), ssRNA(+/-), dsRNA, RNA, ssDNA(+/-), dsDNA and etc.

Sequence Information 13. Accession: (source from GenBank) NCBI GenBank Accession of the nucleotide sequence. 14. Locus: (source from GenBank) the locus name of the nucleotide sequence. 15. SRA_Accession: (source from GenBank) NCBI SRA Accession of the nucleotide sequence. 16. Submitters: (source from GenBank) submitters of the nucleotide sequence. 17. Sequence_Type: (source from GenBank) whether the nucleotide sequence is a reference sequence (recorded as “RefSeq”) or a non-reference sequence (recorded as “GenBank”). 18. BioSample: (source from GenBank) NCBI BioSample Accession of the nucleotide sequence. 19. GenBank_Title: (source from GenBank) the field “DEFINITION” of NCBI GenBank database of the sequence. 20. Genotype: (source from GenBank) genotype of the nucleotide sequence. 21. Segment: (source from GenBank) segment identifier of the nucleotide sequence. 22. Unified_Segment_Number: (customized field) renumbering of the field Segment. Each segment is assigned a new number from 1. Segment of the unsegmented virus is assigned as 1.

Host Information 23. Host_Species: (customized field) the species of the dead-end host of the virus. 24. Host_Genus: (customized field) the genus of the dead-end host of the virus. 25. Host_Family: (customized field) the family of the dead-end host of the virus. 26. Host: (source from GenBank) the field from the NCBI GenBank database that represents dead-end host or vectors.

Biosafety Information 27. Recommended_BSL: (customized field) recommended biosafety level of laboratory to research the virus (recorded as “2”, “3”, “4”, “NAV”). 28. BMBL_Recommended_BSL: (source from BMBL) BMBL recommended biosafety level of laboratory to research the virus (recorded as “2”, “2 with 3 practices”, “2b”, “3”, “3a”, “3b”, “4”, “NAV”). 29. Basis_of_Rating: (source from BMBL) risk assessment of the virus (recorded as “A1”, “A2”, “A3”, “A4”, “A7”, “IE”, “S”, “NAV”). 30. Antigenic_Group: (source from BMBL) the antigenic group of the virus. 31. Isolated: (customized field) whether the virus has been isolated (“Yes” or “No”).

Source Information 32. Latitude_and_Longitude: (source from GenBank) longitude and latitude of the virus isolation source. 33. State_or_Province: (customized field) state or provincial administrative unit of the virus source. 34. Geo_Location: (source from GenBank) geographical position of the virus source. 35. Country_or_Region: (customized field) the country or region of the virus source. 36. Isolation_Source: (source from GenBank) the organism which the virus was collected from. 37. Collection_Date: (source from GenBank) the date that the virus was collected. 38. Submit_Date: (source from GenBank) the date that the virus was submitted. 39. Release_Date: (source from GenBank) the date that the virus was released or last modified.

References 40. Publications: (customized field) the number of publications and literature covering the specific virus research. 41. Accession_URL: (customized field) the DOI leading directly to the GenBank source.

The nucleotide sequences file and amino acid sequences file are standard FASTA files. Each sequence information consists of two lines, header and content. The header contains two types of information, locus and accession, split by '|'. Content is a specific nucleic acid or amino acid sequence. The detailed definitions of the fields in the header are as follows: 1. Locus: NCBI GenBank LOCUS ID of the nucleotide sequence. 2. Accession: NCBI GenBank Accession of the nucleotide sequence. Protein_ID: a protein sequence identification number (for amino acid sequences file).
M
NCBI Virus: Severe acute respiratory syndrome coronavirus 2 data hub
catalog.midasnetwork.us
acc, csv, fasta, xml
Updated Jul 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MIDAS Coordination Center (2023). NCBI Virus: Severe acute respiratory syndrome coronavirus 2 data hub [Dataset]. https://catalog.midasnetwork.us/collection/167
Explore at:
fasta, xml, csv, accAvailable download formats
Dataset updated
Jul 6, 2023
Dataset authored and provided by
MIDAS Coordination Center
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Variables measured
disease, COVID-19, pathogen, Homo sapiens, host organism, infectious disease, sequence collection, Severe acute respiratory syndrome coronavirus 2
Dataset funded by
National Institute of General Medical Sciences
Description
A data hub for searching, retrieving, and analyzing SARS-CoV-2 GenBank data.
r
NCBI Genome
rrid.site
dknet.org
+1more
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). NCBI Genome [Dataset]. http://identifiers.org/RRID:SCR_002474/resolver?q=*&i=rrid
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002474 https://identifiers.org/RRID:SCR_002474/resolver?q=*&i=rrid
Dataset updated
Jul 6, 2025
Description
Database that organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations in six major organism groups: Archaea, Bacteria, Eukaryotes, Viruses, Viroids, and Plasmids. Genomes of over 1,200 organisms can be found in this database, representing both completely sequenced organisms and those for which sequencing is in progress. Users can browse by organism, and view genome maps and protein clusters. Links to other prokaryotic and archaeal genome projects, as well as BLAST tools and access to the rest of the NCBI online resources are available.
f
List of NCBI accession numbers for viral and host sequences used in this...
figshare.com
plos.figshare.com
csv
Updated Sep 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
G. Eric Bastien; Rachel N. Cable; Cecelia Batterbee; A. J. Wing; Luis Zaman; Melissa B. Duhaime (2024). List of NCBI accession numbers for viral and host sequences used in this study. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011649.s013
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1011649.s013
Dataset updated
Sep 30, 2024
Dataset provided by
PLOS Computational Biology
Authors
G. Eric Bastien; Rachel N. Cable; Cecelia Batterbee; A. J. Wing; Luis Zaman; Melissa B. Duhaime
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
List of NCBI accession numbers for viral and host sequences used in this study.
r
Data from: NCBI Taxonomy
rrid.site
dknet.org
+2more
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). NCBI Taxonomy [Dataset]. http://identifiers.org/RRID:SCR_003256
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_003256
Dataset updated
Jun 23, 2025
Description
Database for a curated classification and nomenclature that contains the names of all organisms that are represented in the public sequence databases with at least one nucleotide or protein sequence. Data provided encompasses archaea, bacteria, eukaryota, viroids and viruses. The NCBI taxonomy database is not a primary source for taxonomic or phylogenetic information. Furthermore, the database does not follow a single taxonomic treatise but rather attempts to incorporate phylogenetic and taxonomic knowledge from a variety of sources, including the published literature, web-based databases, and the advice of sequence submitters and outside taxonomy experts. Consequently, the NCBI taxonomy database is not a phylogenetic or taxonomic authority and should not be cited as such.
d
Influenza Virus Resource
dknet.org
neuinfo.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Influenza Virus Resource [Dataset]. http://identifiers.org/RRID:SCR_002984
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002984 https://identifiers.org/RRID:SCR_002984/resolver
Dataset updated
Jan 29, 2022
Description
Database of data obtained from the NIAID Influenza Genome Sequencing Project as well as from GenBank, combined with tools for flu sequence analysis and annotation. In addition, it provides links to other resources that contain flu sequences, publications and general information about flu viruses. Users can search the Flu database, build queries, retrieve sequences, and apply analysis tools. This includes selecting influenza sequences by virus, subtype, host, and other criteria, finding complete genome sets, aligning sequence and others in the database (up to 1000 sequences), viewing clustering and phylogenetic trees, BLAST searching a flu sequence against the database, and more.
d
NCBI Genome
dknet.org
Updated Aug 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). NCBI Genome [Dataset]. http://identifiers.org/RRID:SCR_002474
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002474
Dataset updated
Aug 1, 2024
Description
Database that organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations in six major organism groups: Archaea, Bacteria, Eukaryotes, Viruses, Viroids, and Plasmids. Genomes of over 1,200 organisms can be found in this database, representing both completely sequenced organisms and those for which sequencing is in progress. Users can browse by organism, and view genome maps and protein clusters. Links to other prokaryotic and archaeal genome projects, as well as BLAST tools and access to the rest of the NCBI online resources are available.
n
Data from: Genetic diversity and spread dynamics of SARS-CoV-2 variants...
data.niaid.nih.gov
search.dataone.org
+2more
zip
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Desire Mtetwa (2024). Genetic diversity and spread dynamics of SARS-CoV-2 variants present in African populations [Dataset]. http://doi.org/10.5061/dryad.1c59zw42d
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.1c59zw42d
Dataset updated
May 31, 2024
Dataset provided by
Chinhoyi University of Technology
Authors
Desire Mtetwa
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The dynamics of coronavirus disease-19 (COVID-19) have been extensively researched in many settings around the world, but little is known about these patterns in Africa. 7540 complete nucleotide genomes from 51 African nations were obtained and analysed from the National Center for Biotechnology Information (NCBI) and Global Initiative on Sharing Influenza Data (GISAID) databases to examine genetic diversity and spread dynamics of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) lineages circulating in Africa. Utilising a variety of clade and lineage nomenclature schemes, we looked at their diversity, and used maximum parsimony inference methods to recreate their evolutionary divergence and history. According to this study, only 465 of the 2610 Pango lineages found to have existed in the world circulated in Africa after three years of the COVID-19 pandemic outbreak, with five different lineages dominating at various points during the outbreak. We identified South Africa, Kenya, and Nigeria as key sources of viral transmissions between Sub-Saharan African nations. These findings provide insight into the viral strains that are circulating in Africa and their evolutionary patterns. Methods Dataset mining and workflow SARS-CoV-2 genome sequences collected from Africa were obtained from NCBI database and GISAID database on February 26, 2023. 24415 African sequences were retrieved from both databases so as to examine the number of lineages circulating within Africa. The two databases had only 8044 complete genome sequences combined from Africa, and these sequences excluding those with low coverage using NextClade were retrieved to determine spread dynamics. 5908 sequences from 23 African countries were available in the NCBI and 2137 sequences from 41 African countries from GISAID database. The sequences were aligned using the online version of the MAFFT multiple sequence alignment tool, with the Wuhan-Hu-1 (MN 908947.3) as the reference sequence, and sequences with more than 5.0% ambiguous letters were removed. Duplicates were removed using goalign dedup software and only high quality African complete sequences remained (n=7540). Phylogenetic reconstruction Using IQ-TREE multicore software version v1.6.12 and NextClade, phylogeny reconstruction on the dataset was performed numerous times. Lineage classification PANGOLin, a web application was used to classify sequences into their lineages. The objective was to determine the SARS-CoV-2 lineages that are circulating in Africa that are most important from an epidemiological perspective, as well as the lineage dynamics within and across the African continent, due to the fact that this naming system integrates genetic and geographic data concerning SARS-CoV-2 dynamics. Phylogeographic reconstruction VOC, (VOI) and VUM were designated based on the WHO framework as of 20 January 2022. We included one lineage, namely A.23.1 and labelled it as VOI for the purposes of this analysis. This lineage was included because it demonstrated the continued evolution of African lineages into potentially more transmissible variants. VOI, VOC, and VUM that emerged on the African continent were marked. These were A.23.1 (VOI), B.1.351 and B.1.1.529 (VOC), B.1.640, and B.1.525 (VUM). Genome sequences of these five lineages were extracted from NCBI database for phylogeographic reconstruction. A similar approach to that described above (including alignment using online MAFFT) was employed. Phylogeographic reconstruction for all variants circulating in Africa and all VOI, VOC, and VUM was conducted using PASTML.
Viral reference data for PathoLive
zenodo.org
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon H. Tausch; Simon H. Tausch (2020). Viral reference data for PathoLive [Dataset]. http://doi.org/10.5281/zenodo.2536788
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2536788
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Simon H. Tausch; Simon H. Tausch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Viral reference data for PathoLive including GI numbers and taxonomic information per sequence. Data taken from the viral part of the NCBI RefSeq downloaded on 2016-07-06.
Z
dudesdb_201709 - Fungi and Virus - RefSeq - Complete Genomes
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Piro, Vitor C. (2020). dudesdb_201709 - Fungi and Virus - RefSeq - Complete Genomes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1037287
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Piro, Vitor C.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
bowtie2 index and dudes database for the set of Fungal and Viral complete genomes from NCBI RefSeq, dating from 2017-09. The dudes database was made based on accession version numbers (DUDesDB.py option -m "av").

Facebook

Twitter

Click to copy link

Link copied

Cite

National Library of Medicine (2025). NCBI Virus [Dataset]. https://catalog.data.gov/dataset/ncbi-virus

NCBI Virus

Explore at:

Dataset updated

Jun 19, 2025

Dataset provided by

National Library of Medicine

Description

NCBI Virus is an integrative, value-added resource designed to support retrieval, display and analysis of a curated collection of virus sequences and large sequence datasets. Its goal is to increase the usability of viral sequence data archived in GenBank and other NCBI repositories. This resource includes resources previously included in HIV-1, Human Protein Interaction Database, Influenza Virus Resource, and Virus Variation.

Clear search

Close search

Google apps

Main menu

NCBI Virus

Viral genomes from GenBank (reference) - Comparative analysis of gene...

NCBI Virus BLAST Database

Diamond NCBI Genbank Viral database for SOVAP

Virus protein-related documents from NCBI Reference Sequence Database and...

Plant virus database (PVirDB)

RefSeq virus protein structure prediction database

Datasets - Unveiling Host-Parasite Relationships through Conserved MITEs in...

NCBI Virus - v3g7-abyx - Archive Repository

COVID-19 Genome Sequence Dataset

Data from: A global dataset of sequence, diversity and biosafety...

NCBI Virus: Severe acute respiratory syndrome coronavirus 2 data hub

NCBI Genome

List of NCBI accession numbers for viral and host sequences used in this...

Data from: NCBI Taxonomy

Influenza Virus Resource

NCBI Genome

Data from: Genetic diversity and spread dynamics of SARS-CoV-2 variants...

Viral reference data for PathoLive

dudesdb_201709 - Fungi and Virus - RefSeq - Complete Genomes

NCBI Virus