100+ datasets found
  1. COG-UK Viral Genome Sequences

    • healthdatagateway.org
    unknown
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    COG-UK Consortium (2024). COG-UK Viral Genome Sequences [Dataset]. http://doi.org/10.1016/S2666-5247(20)30054-9
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Oct 8, 2024
    Dataset provided by
    COVID-19 Genomics UK Consortium
    Authors
    COG-UK Consortium
    License

    https://www.cogconsortium.uk/data/https://www.cogconsortium.uk/data/

    Description

    The current COVID-19 pandemic, caused by the SARS-CoV-2 virus, represents a major threat to health in the UK and globally. To fully understand the transmission and evolution of the virus requires sequencing and analysing viral genomes at scale and speed. The numbers of samples calls for a rapid increase in the UK’s pathogen genome sequencing capacity rapidly and robustly.

    To provide this increased capacity to collect, sequence and analyse the whole genomes of virus samples in the UK, the COVID-19 Genomics UK (COG-UK) consortium is pooling the world leading knowledge and expertise in genomics of the four UK Public Health Agencies, multiple regional University hubs, and large sequencing centres such as the Wellcome Sanger Institute.

  2. f

    Viral genomes from GenBank (reference) - Comparative analysis of gene...

    • figshare.com
    application/x-gzip
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enrique Gonzalez Tortuero; Revathy Krishnamurthi; Heather Allison; Ian Goodhead; Chloë James (2023). Viral genomes from GenBank (reference) - Comparative analysis of gene prediction tools for viral genome annotation [Dataset]. http://doi.org/10.6084/m9.figshare.21353829.v1
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    figshare
    Authors
    Enrique Gonzalez Tortuero; Revathy Krishnamurthi; Heather Allison; Ian Goodhead; Chloë James
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The file "viral.genomic.gbk.tar.gz" contains all the RefSeq viral database information in GenBank format, used as the gold standard for the comparisons. In such a way, it should be run as is when using the script "genecounter.py" to count the number of genes, while it is the second (mandatory) input file for the counting of true positives (TP), false positives (FP) and false negatives (FN) via "coordinateschecker.py". In any case, it could also be used for other evaluation purposes.

  3. s

    IVDB - Influenza Virus Database

    • scicrunch.org
    • neuinfo.org
    • +1more
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). IVDB - Influenza Virus Database [Dataset]. http://identifiers.org/RRID:SCR_013404
    Explore at:
    Dataset updated
    Dec 4, 2023
    Description

    IVDB hosts complete genome sequences of influenza A virus generated by BGI and curates all other published influenza virus sequences after expert annotations. For the convenience of efficient data utilization, our Q-Filter system classifies and ranks all nucleotide sequences into 7 categories according to sequence content and integrity. IVDB provides a series of tools and viewers for analyzing the viral genomes, genes, genetic polymorphisms and phylogenetic relationships comparatively. A searching system is developed for users to retrieve a combination of different data types by setting various search options. To facilitate analysis of the global viral transmission and evolution, the IV Sequence Distribution Tool (IVDT) is developed to display worldwide geographic distribution of the viral genotypes and to couple genomic data with epidemiological data. The BLAST, multiple sequence alignment tools and phylogenetic analysis tools were integrated for online data analysis. Furthermore, IVDB offers instant access to the pre-computed alignments and polymorphism analysis of influenza virus genes and proteins and presents the results by SNP distribution plots and minor allele distributions. IVDB aims to be a powerful information resource and an analysis workbench for scientists working on IV genetics, evolution, diagnostics, vaccine development, and drug design.

  4. Viral RefSeq databases for Centrifuge, Kraken2 and DIAMOND

    • zenodo.org
    • datadryad.org
    application/gzip, txt
    Updated Jun 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna-Sapfo Malaspinas; Anna-Sapfo Malaspinas; Samuel Neuenschwander; Samuel Neuenschwander; Yami Arizmendi Cárdenas; Yami Arizmendi Cárdenas (2022). Viral RefSeq databases for Centrifuge, Kraken2 and DIAMOND [Dataset]. http://doi.org/10.5061/dryad.mkkwh711w
    Explore at:
    txt, application/gzipAvailable download formats
    Dataset updated
    Jun 5, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anna-Sapfo Malaspinas; Anna-Sapfo Malaspinas; Samuel Neuenschwander; Samuel Neuenschwander; Yami Arizmendi Cárdenas; Yami Arizmendi Cárdenas
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Owing to technological advances in ancient DNA, it is now possible to sequence viruses from the past to track down their origin and evolution. However, ancient DNA data is considerably more degraded and contaminated than modern data making the identification of ancient viral genomes particularly challenging. Several methods to characterise the modern microbiome (and, within this, the virome) have been developed; in particular, tools that assign sequenced reads to specific taxa in order to characterise the organisms present in a sample of interest. While these existing tools are routinely used in modern data, their performance when applied to ancient microbiome data to screen for ancient viruses remains unknown.

    In this work, we conducted an extensive simulation study using public viral sequences to establish which tool is the most suitable to screen ancient samples for human DNA viruses. We compared the performance of four widely used classifiers, namely Centrifuge, Kraken2, DIAMOND and MetaPhlAn2, in correctly assigning sequencing reads to the corresponding viruses. To do so, we simulated reads by adding noise typical of ancient DNA to a set of publicly available human DNA viral sequences and to the human genome. We fragmented the DNA into different lengths, added sequencing error and C to T and G to A deamination substitutions at the read termini. Then we measured the resulting sensitivity and precision for all classifiers.

    Across most simulations, more than 228 out of the 233 simulated viruses are recovered by Centrifuge, Kraken2 and DIAMOND, in contrast to MetaPhlAn2 which recovers only around one third. Overall, Centrifuge and Kraken2 have the best performance with the highest values of sensitivity and precision. We found that deamination damage has little impact on the performance of the classifiers, less than the sequencing error and the length of the reads. Since Centrifuge can handle short reads (in contrast to DIAMOND and Kraken2 with default settings) and since it achieves the highest sensitivity and precision at the species level across all the simulations performed, it is our recommended tool. Regardless of the tool used, our simulations indicate that, for ancient human studies, users should use strict filters to remove all reads of potential human origin. Finally, we recommend to verify which species are present in the database used, as it might happen that default databases lack sequences for viruses of interest.

  5. d

    T4-like genome database

    • dknet.org
    • scicrunch.org
    Updated Oct 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). T4-like genome database [Dataset]. http://identifiers.org/RRID:SCR_005367
    Explore at:
    Dataset updated
    Oct 16, 2019
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented August 22, 2016. A database of information on bacterial phages. It contains multiple phage genomes, which users can BLAST and MegaBLAST, and also hosts a Phage Forum in which users can discuss phage data. Interactive browsing of completed phage genomes is available using the program. The browser allows users to scan the genome for particular features and to download sequence information plus analyses of those features. Views of the genome are generated showing named genes BLAST similarities to other phages predicted tRNAs and other sequence features.

  6. d

    VIRsiRNAdb

    • dknet.org
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). VIRsiRNAdb [Dataset]. http://identifiers.org/RRID:SCR_006108
    Explore at:
    Dataset updated
    Aug 31, 2024
    Description

    VIRsiRNAdb is a curated database of experimentally validated viral siRNA / shRNA targeting diverse genes of 42 important human viruses including influenza, SARS and Hepatitis viruses. Submissions are welcome. Currently, the database provides detailed experimental information of 1358 siRNA/shRNA which includes siRNA sequence, virus subtype, target gene, GenBank accession, design algorithm, cell type, test object, test method and efficacy (mostly quantitative efficacies). Further, wherever available, information regarding alternative efficacies of above 300 siRNAs derived from different assays has also been incorporated. The database has facilities like search, advance search (using Boolean operators AND, OR) browsing (with data sorting option), internal linking and external linking to other databases (Pubmed, Genbank, ICTV). Additionally useful siRNA analysis tools are also provided e.g. siTarAlign for aligning the siRNA sequence with reference viral genomes or user defined sequences. virsiRNAdb would prove useful for RNAi researchers especially in siRNA based antiviral therapeutics development.

  7. o

    Virosaurus dataset

    • explore.openaire.eu
    Updated Jan 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anne Gleizes; philippe Le Mercier; Edouard de Castro (2022). Virosaurus dataset [Dataset]. http://doi.org/10.5281/zenodo.5863049
    Explore at:
    Dataset updated
    Jan 17, 2022
    Authors
    Anne Gleizes; philippe Le Mercier; Edouard de Castro
    Description

    Virosaurus (from virus thesaurus) is a curated virus genome database, aimed at facilitating clinical metagenomics analysis. The data comprises clustered and annotated sequences of Vertebrate viruses , Others viruses (Insect, Fungus, Eukaryotic microorgansism) or Plant viruses in FASTA format. Virosaurus also provides complete virus sequence dataset for all those viruses, which comprises complete genomes for nonsegmented viruses, and complete segments for segmented viruses. Complete sequences: This dataset contains full-length genomes (monopartite virus) or segments (segmented virus) for all vertebrate virus families. Virosaurus: Virus reference sequence databases for clinical metagenomics. All complete sequences were clustered at 90% to remove redundancy in Virosaurus Vertebrate 90 (23,615 FASTAs); or clustered at 98% in Virosaurus vertebrate 98 (73,160 FASTAs). Many clusters can belong to the same virus species. For example, there are 100 Lassa virus clusters in Virosaurus90, 638 in Virosaurus98. The FASTA header have been annotated with metadata to facilitate metagenomic analysis. For instance, viral nucleic acid is annotated as RNA, DNA or RNA/DNA, thereby improving interpretation from sequencing either molecule.

  8. f

    GCVDB Viruses

    • figshare.com
    application/gzip
    Updated Jan 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bailey Wallace (2024). GCVDB Viruses [Dataset]. http://doi.org/10.6084/m9.figshare.24968805.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 15, 2024
    Dataset provided by
    figshare
    Authors
    Bailey Wallace
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Viral genomes and genome fragments from the Global Coral Viruses Database (GCVDB).

  9. n

    VIDA

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Oct 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). VIDA [Dataset]. http://identifiers.org/RRID:SCR_007111
    Explore at:
    Dataset updated
    Oct 16, 2019
    Description

    VIDA contains a collection of homologous protein families derived from open reading frames from complete and partial virus genomes. For each family, users can get an alignment of the conserved regions, functional and taxonomy information, and links to DNA sequences and structures. * Search homologous protein families from particular virus families * Links to complete genome sequence: Arteriviridae, Coronaviridae, Herpesviridae, Poxviridae The Virus Database at University College London has been developed as a system to organize animal virus open reading frame sequences. All known and predicted protein sequences from complete and partial genomes of particular virus families are extracted from GenBank and filtered to remove 100% redundancy. On the basis of sequence similarity the sequences are then clustered into homologous protein families (HPFs). The families are enriched with annotations including function and functional classification, related protein structures, taxonomy, length of the proteins, boundaries of the conserved region/s, virus-specific gene name and links to EMBL entries and SWISSPROT.

  10. b

    VirGen

    • bioregistry.io
    Updated Feb 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). VirGen [Dataset]. https://bioregistry.io/registry/virgen
    Explore at:
    Dataset updated
    Feb 20, 2024
    Description

    VirGen a comprehensive viral genome resource, which organizes the ‘sequence space’ of viral genomes in a structured fashion. It has been developed with an objective to serve as an annotated and curated database for complete viral genome sequences.

  11. Metadata record for: Domain-centric database to uncover structure of...

    • springernature.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scientific Data Curation Team (2023). Metadata record for: Domain-centric database to uncover structure of minimally characterized viral genomes [Dataset]. http://doi.org/10.6084/m9.figshare.12319631.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Scientific Data Curation Team
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains key characteristics about the data described in the Data Descriptor Domain-centric database to uncover structure of minimally characterized viral genomes. Contents:

        1. human readable metadata summary table in CSV format
    
    
        2. machine readable metadata file in JSON format
    
  12. o

    COVID-19 Genome Sequence Dataset

    • registry.opendata.aws
    • catalog.midasnetwork.us
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (NLM) (2020). COVID-19 Genome Sequence Dataset [Dataset]. https://registry.opendata.aws/ncbi-covid-19/
    Explore at:
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    <a href="http://nlm.nih.gov/">National Library of Medicine (NLM)</a>
    Description

    This repository within the ACTIV TRACE initiative houses a comprehensive collection of datasets related to SARS-CoV-2. The processing of SARS-CoV-2 Sequence Read Archive (SRA) files has been optimized to identify genetic variations in viral samples. This information is then presented in the Variant Call Format (VCF). Each VCF file corresponds to the SRA parent-run's accession ID. Additionally, the data is available in the parquet format, making it easier to search and filter using the Amazon Athena Service. The SARS-CoV-2 Variant Calling Pipeline is designed to handle new data every six hours, with updates to the AWS ODP bucket occurring daily.

  13. Z

    "Genome binning of viral entities from bulk metagenomics data" - CAMISIM...

    • data.niaid.nih.gov
    Updated Jan 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johansen, Joachim (2022). "Genome binning of viral entities from bulk metagenomics data" - CAMISIM simulated datasets and genomes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5676246
    Explore at:
    Dataset updated
    Jan 5, 2022
    Dataset authored and provided by
    Johansen, Joachim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Genome binning of viral entities from bulk metagenomics data

    Authors

    Joachim Johansen1,2, Damian R. Plichta2, Jakob Nybo Nissen1,3, Marie Louise Jespersen1,4, Shiraz A. Shah5, Ling Deng6, Jakob Stokholm5,6, Hans Bisgaard5, Dennis Sandris Nielsen6, Søren Sørensen7, Simon Rasmussen1

    Affiliations

    1 Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen N, Denmark

    2 Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA

    3 Statens Serum Institut, Viral & Microbial Special diagnostics, Copenhagen, Denmark

    4 National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark

    5 Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark

    6 Section of Food Microbiology and Fermentation, Department of Food Science, Faculty of Science, University of Copenhagen, Copenhagen, Denmark

    7 Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark

    Methods description

    We compared the viral binning performance of VAMB and MetaBAT2 using the official CAMI consortium method to create assemblies and metagenome profiles. To this end we generated 3 different metagenome compositions with up to 308 reference genomes; one mixed with bacteria, plasmids and viruses to test binning in complex samples i.e. high diversity (1), one with only crass-like viruses to test binning with highly similar viruses i.e. high relatedness (2) and a set of small-viruses (<6,000 bp) including members of the Microviridae family to address the bias of size (3). Bacterial genomes were gathered from NCBIs refseq genome repository 2021, plasmids from the PLSDB database (v. 2021_06_23) and viral genomes from the recent MGV database.

    Dataset A contained a mixture of bacteria (N=8), plasmids (N=20) and viruses (N=280) to test binning in complex samples, i.e. high diversity. Dataset B contained only crass-like viruses (N=80) to test binning with highly similar viruses i.e. high relatedness. Dataset C contained small-viruses (N=50, <6,000 bp) of the Microviridae family to address the bias of size. Bacterial genomes were sampled from the Refseq genome repository 2021, plasmids from the PLSDB database and viral genomes from the recent MGV database (Nayfach, et al. Nature Microbiology 2021).

  14. Databases for NEXT-RSV-SEQ (RSV, HMPV, PIV)

    • zenodo.org
    zip
    Updated May 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephan Fuchs; Stephan Fuchs; Sophie Köndgen; Sophie Köndgen (2024). Databases for NEXT-RSV-SEQ (RSV, HMPV, PIV) [Dataset]. http://doi.org/10.5281/zenodo.8133844
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Stephan Fuchs; Stephan Fuchs; Sophie Köndgen; Sophie Köndgen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Here, we provide databases ready for use in sequencing read decontamination with our Next-RSV-SEQ pipeline (https://gitlab.com/rki_bioinformatics/next-rsv-seq), designed for viral genome assembly using Illumina data.

    Available Databases:

    • Human orthopneumovirus / Human respiratory syncytial virus (RSV): [RSV_GRCh38_2022-04-06.zip]
    • Human metapneumovirus (HMPV): [HMPV_2022-05-28.zip]
    • Parainfluenza virus (PIV): [PIV_2022-05-28.zip]

    These databases were created using Kraken2 and are based on all complete viral genome sequences available on NCBI Reference Sequence Database (RefSeq) as of April 6, 2022 (for RSV), and May 28, 2022 (for HMPV and PIV). The RSV database also includes the human genome sequence GRCh38 (Genome Reference Consortium Human Build 38). The databases have been compressed into zip format for easy downloading. Before use, please unpack the respective zip archive.

  15. s

    Hepatitis Virus B Database

    • scicrunch.org
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Hepatitis Virus B Database [Dataset]. http://identifiers.org/RRID:SCR_007705
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    HepSEQ is the International Repository for Hepatitis B Virus Strain Data. It is web-accessible, quality-based, molecular, clinical and epidemiological database for hepatitis B infection and provides a tool for the research community or for those involved in hepatitis B case management. This database currently has 1012 patient records and 1253 viral sequences. The quality of all submitted sequences is checked. The tools provided include: SeqMatch: search the database for matching sequences Genotyper: genotype HBV strains (based on HBV surface antigen genes) Gene Mutation: display the sequences that contain mutations in HBV coding regions Mutation Annotator: annotate sequences for mutation known to be associated with anti-viral resistance This web database development is funded by the UK Department of Health is curated and is hosted by the Health Protection Agency.

  16. Virus+ Sequence Masked Mouse Reference Genome (GRCm38)

    • zenodo.org
    • explore.openaire.eu
    application/gzip
    Updated Feb 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott A Handley; Scott A Handley (2021). Virus+ Sequence Masked Mouse Reference Genome (GRCm38) [Dataset]. http://doi.org/10.5281/zenodo.4116249
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 9, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Scott A Handley; Scott A Handley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A version of the mouse genome (GRCm38) masked for all possible viral sequences.

    See Virus+ Masked Human Genome for a masked human reference database.

    The following commands were used to generate the additional virus sequence masked reference database:

    1) Download all RefSeq and Neighbor nucleotide records:

    https://www.ncbi.nlm.nih.gov/nuccore/?term=Viruses[Organism]%20NOT%20cellular%20organisms[ORGN]%20NOT%20wgs[PROP]%20NOT%20gbdiv%20syn[prop]%20AND%20(srcdb_refseq[PROP]%20OR%20nuccore%20genome%20samespecies[Filter])

    2) Shred the downloaded viral genomes using shred.sh from the bbtools package

    shred.sh in=refseq_virus_reformated.fasta out=virus_shred.fasta.gz length=85 minlength=75 overlap=30

    3) Map shredded virus sequence to the GRCm38 genome using bbmap.sh from the bbtools package

    bbmap.sh ref=GRCm38.fa.gz in=virus_shred.fasta.gz outm=map_mouse_all_viruses.sam minid=0.90

    4) Mask virus sequenced mapped regions from the GRCm38 genome using bbmask.sh from the bbtools package

    bbmask.sh in=GRCm38.fa.gz out=GRCm38_virus_masked.fasta.gz sam=map_mouse_all_viruses.sam

    5) Remove all N's to further reduce file size using seqkit
    seqkit -is replace -p "n" -r "" GRCm38_virus_masked.fasta.gz > mouse_virus_masked.fasta_Ns_removed.gz

    Additional References:

    1. bbtools
    2. seqkit
    3. NCBI Virus Genome RefSeq
  17. H

    COG-UK Viral Genome Sequences

    • dtechtive.com
    • find.data.gov.scot
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    COVID-19 GENOMICS UK (2023). COG-UK Viral Genome Sequences [Dataset]. https://dtechtive.com/datasets/26040
    Explore at:
    Dataset updated
    May 30, 2023
    Dataset provided by
    COVID-19 GENOMICS UK
    Area covered
    United Kingdom
    Description

    COG-UK Consortium has published a dataset which contains over 20K SARS-CoV-2 viral genome sequences available as open access.

  18. d

    Data from: Viral tagging reveals discrete populations in Synechococcus viral...

    • search.dataone.org
    • datadryad.org
    • +1more
    Updated Apr 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li Deng; J. Cesar Ignacio-Espinoza; Ann C. Gregory; Bonnie T. Poulos; Joshua S. Weitz; Philip Hugenholtz; Matthew B. Sullivan (2025). Viral tagging reveals discrete populations in Synechococcus viral genome sequence space [Dataset]. http://doi.org/10.5061/dryad.gr3ks
    Explore at:
    Dataset updated
    Apr 19, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Li Deng; J. Cesar Ignacio-Espinoza; Ann C. Gregory; Bonnie T. Poulos; Joshua S. Weitz; Philip Hugenholtz; Matthew B. Sullivan
    Time period covered
    Jan 1, 2015
    Description

    Microbes and their viruses drive myriad processes across ecosystems ranging from oceans and soils to bioreactors and humans. Despite this importance, microbial diversity is only now being mapped at scales relevant to nature, while the viral diversity associated with any particular host remains little researched. Here we quantify host-associated viral diversity using viral-tagged metagenomics, which links viruses to specific host cells for high-throughput screening and sequencing. In a single experiment, we screened 107 Pacific Ocean viruses against a single strain of Synechococcus and found that naturally occurring cyanophage genome sequence space is statistically clustered into discrete populations. These population-based, host-linked viral ecological data suggest that, for this single host and seawater sample alone, there are at least 26 double-stranded DNA viral populations with estimated relative abundances ranging from 0.06 to 18.2%. These populations include previously cultivated...

  19. d

    Data from: HoloBee Database v2016.1

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +3more
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). HoloBee Database v2016.1 [Dataset]. https://catalog.data.gov/dataset/holobee-database-v2016-1-9e8e9
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    Organisms living in honey bees and honey bee colonies form large associative holobiont communities that are integral to bee biology. High-throughput sequencing approaches to characterize these holobiont communities from honey bees in various states of health and disease are now commonplace, producing large amounts of nucleotide sequence data that must be accurately and consistently analyzed in order to produce reliable and comparable reports. In addition, new species designations and revisions are actively being made from honey bee holobiont communities, complicating nomenclature in larger databases where taxonomic descriptions associated with archived sequences can quickly become outdated and misleading. To improve the accuracy and consistency of honey bee holobiont research, we have developed HoloBee: a curated database of publicly accessioned nucleotide sequences from the honey bee holobiont community. Except in rare and noted exceptions made by curators, sequences used in HoloBee were obtained from, or in association with, Apis mellifera (Western honey bee) as well as other honey bee species where available (e.g. Apis cerana, Apis dorsata, Apis laboriosa, Apis koschevnikovi, Apis florea, Apis andreniformis and Apis nigrocincta). Sources include: within or on the surface of honey bees (adult, pupae, larvae, egg), corbicular pollen, bee bread, royal jelly, honey, comb, hive surfaces (e.g. bottom board debris, frames, landing platforms), and isolates of microbes, parasites and pathogens from honey bees. HoloBee contains two non-overlapping sets of sequence data, HoloBee-Barcode and HoloBee-Mop, each of which have distinct intended uses. HoloBee-Barcode is a non-redundant database of taxonomically informative barcoding loci for all viruses, bacteria, fungi, protozoans and metazoans associated with honey bees (Apis spp.). It was created from an exhaustive master sequence archive of all valid holobiont sequences. Redundancy was removed from this master archive using a clustering algorithm that grouped sequences with ≥ 99% identity and retained the longest sequence from each cluster as the representative accession for that sequence type (“centroid”). These centroid sequences were concatenated into a fasta formatted file to create the HoloBee-Barcode database. Associated taxonomy for each centroid, including Superkingdom through Species and Strain/Isolate, was individually reviewed and corrected when necessary by a curator. Cross reference tables (separated according to 5 major taxonomic groups) provide a user-friendly outline of information for each centroid accession within HoloBee-Barcode including taxonomy, gene/product name, sequence length, the unaltered NCBI definition line, the number and identity of redundant sequences clustered within each centroid, and any additional information provided by the curator. HoloBee-Barcode centroid counts are: Viruses = 86; Bacteria = 496; Fungi = 41; Protozoa = 4; Metazoa = 60. HoloBee-Barcode is intended to improve and standardize quantitative and qualitative metagenomic descriptions of holobiont communities associated with honey bees by providing a curated set of barcode sequences. The goal of genetic barcoding is to associate a nucleotide sequence sample to a taxonomically valid species. Genomic regions targeted for such barcoding purposes varied by taxonomic group. The small subunit (SSU) ribosomal RNA, or 16S rRNA, is the most commonly used barcode for bacteria and is used in HB-Barcode. These 16S rRNA sequences will support the analysis of data generated with the widely used approach of amplicon-based 16S rRNA deep sequencing to study microbiota communities. Although barcode markers for fungi are less definitive than bacteria, HB-Barcode defaults to the ribosomal RNA internal transcribed spacer region (ITS), which typically includes ITS-1, 5.8S, and ITS-2. For some clades that cannot be resolved by this region, other barcode markers were selected. The majority of barcodes for metazoan taxa are the mitochondrial locus cytochrome c oxidase subunit I (COI). Complete mitochondrial DNA (mtDNA) sequence for Apis cerana (Asian honey bee) and Galleria mellonella (Greater wax moth) are included as barcodes for these species. We note that A. cerana mtDNA is included because it is considered a potentially invasive honey bee species and monitoring for its occurrence is in practice regionally, including in Australia, New Zealand and the USA. Protozoan barcodes include cytochrome b oxidase (Cytb), SSU, or ITS while entire genomes are used for viral barcoding. HoloBee-Mop is a database comprised mostly of chromosomal, mitochondrial and plasmid genome assemblies in order to aggregate as much honey bee holobiont genomic sequence information as possible. For a few organisms without genome assembly data, transcriptome data are included (e.g. Aethina tumida, small hive beetle). Unlike HoloBee-Barcode, redundancy removal was not performed on the HoloBee-Mop database and thus this resource provides an archive of nucleotide sequence assemblies from honey bee holobionts. However, since full viral genomes are used in HoloBee-Barcode, only redundant viral sequences occur in HoloBee-Mop. All accessions within each of these assemblies were concatenated into a single fasta formatted file to create the HoloBee-Mop database. The intended purpose of HoloBee-Mop is to improve honey bee genome and transcriptome assemblies by “mopping-up” as much viral, bacterial, fungal, protozoan and non-honey bee metazoan sequence data as possible. Therefore, sequence data remaining after processing reads through both HoloBee-Barcode and HoloBee-Mop that do not map to the honey bee genome may contain unique data from taxonomic variants or novel species. Details for each sequence assembly within HoloBee-Mop are tabulated in cross reference tables according to each major taxonomic group. HoloBee-Mop assembly counts are: Viruses = 2; Bacteria = 55; Fungi = 5; Protozoa = 1; Metazoa = 6. Follow the HoloBee database on Twitter at: https://twitter.com/HoloBee_db For questions about the HoloBee database, contact: HoloBee database team: holobee.db@gmail.com Jay Evans: Jay.Evans@ars.usda.gov Anna Childers: Anna.Childers@ars.usda.gov Resources in this dataset:Resource Title: HoloBee_v2016.1 sequence database. File Name: HB_v2016.1.zipResource Description: This compressed file contains two fasta sequence files: HB_Bar_v2016.1.fasta (HoloBee-Barcode database) HB_Mop_v2016.1.fasta (HoloBee-Mop database) md5 values: HB_v2016.1.zip: 6e372e443744282128eb51488176503f HB_Bar_v2016.1.fasta: 109e1f686a690c70ef78fc4b5066a01f HB_Mop_v2016.1.fasta: ced8c3f5987dce69e800c8c491471eba Resource Title: data dictionary for HoloBee_v2016.1. File Name: Data_Dictionary_HoloBee_v2016.1.xlsxResource Title: HoloBee_v2016.1 cross reference tables. File Name: HB_v2016.1_crossref.zipResource Description: This compressed file contains ten spreadsheet files (.xlsx) tabulating detailed information for all centroids (HoloBee-Barcode database) and sequence assemblies (HoloBee-Mop database) used in HoloBee v2016.1: HB_Bar_v2016.1_bacteria_crossref_2016-05-18.xlsx HB_Bar_v2016.1_fungi_crossref_2016-05-20.xlsx HB_Bar_v2016.1_metazoa_crossref_2016-05-16.xlsx HB_Bar_v2016.1_protozoa_crossref_2016-05-20.xlsx HB_Bar_v2016.1_viruses_crossref_2016-05-17.xlsx HB_Mop_v2016.1_bacteria_crossref_2016-05-12.xlsx HB_Mop_v2016.1_fungi_crossref_2016-05-12.xlsx HB_Mop_v2016.1_metazoa_crossref_2016-04-15.xlsx HB_Mop_v2016.1_protozoa_crossref_2016-04-11.xlsx HB_Mop_v2016.1_viruses_crossref_2016-05-12.xlsx md5 value: HB_v2016.1_crossref.zip: a8a57d92830eb77904743afc95980465 Resource Title: data dictionary for HoloBee_v2016.1. File Name: Data_Dictionary_HoloBee_v2016.1.csv

  20. s

    Hepatitis C Virus Database (HCVdb)

    • scicrunch.org
    Updated Jun 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Hepatitis C Virus Database (HCVdb) [Dataset]. http://identifiers.org/RRID:SCR_005718
    Explore at:
    Dataset updated
    Jun 27, 2024
    Description

    The Hepatitis C Virus Database (HCVdb) is a cooperative project of several groups with the mission of providing to the scientific community studying the hepatitis C virus a comprehensive battery of informational and analytical tools. The Viral Bioinformatics Resource Center (VBRC), the Immune Epitope Database and Analysis Resource (IEDB), the Broad Institute Microbial Sequencing Center (MSC), and the Los Alamos HCV Sequence Database (HCV-LANL) are combining forces to acquire and annotate data on Hepatitis C virus, and to develop and utilize new tools to facilitate the study of this group of organisms.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
COG-UK Consortium (2024). COG-UK Viral Genome Sequences [Dataset]. http://doi.org/10.1016/S2666-5247(20)30054-9
Organization logo

COG-UK Viral Genome Sequences

COG-UK Viral Genome Sequences

Explore at:
95 scholarly articles cite this dataset (View in Google Scholar)
unknownAvailable download formats
Dataset updated
Oct 8, 2024
Dataset provided by
COVID-19 Genomics UK Consortium
Authors
COG-UK Consortium
License

https://www.cogconsortium.uk/data/https://www.cogconsortium.uk/data/

Description

The current COVID-19 pandemic, caused by the SARS-CoV-2 virus, represents a major threat to health in the UK and globally. To fully understand the transmission and evolution of the virus requires sequencing and analysing viral genomes at scale and speed. The numbers of samples calls for a rapid increase in the UK’s pathogen genome sequencing capacity rapidly and robustly.

To provide this increased capacity to collect, sequence and analyse the whole genomes of virus samples in the UK, the COVID-19 Genomics UK (COG-UK) consortium is pooling the world leading knowledge and expertise in genomics of the four UK Public Health Agencies, multiple regional University hubs, and large sequencing centres such as the Wellcome Sanger Institute.

Search
Clear search
Close search
Google apps
Main menu