100+ datasets found
  1. d

    GenBank

    • catalog.data.gov
    • datadiscovery.nlm.nih.gov
    • +3more
    Updated Jul 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). GenBank [Dataset]. https://catalog.data.gov/dataset/genbank-14853
    Explore at:
    Dataset updated
    Jul 17, 2025
    Dataset provided by
    National Library of Medicine
    Description

    NIH Genetic sequence database; an annotated collection of all publicly available DNA sequences.

  2. r

    GenBank

    • rrid.site
    • dknet.org
    • +1more
    Updated Jul 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). GenBank [Dataset]. http://identifiers.org/RRID:SCR_002760
    Explore at:
    Dataset updated
    Jul 27, 2025
    Description

    NIH genetic sequence database that provides annotated collection of all publicly available DNA sequences for almost 280 000 formally described species (Jan 2014) .These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. It is part of International Nucleotide Sequence Database Collaboration and daily data exchange with European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through NCBI Entrez retrieval system, which integrates data from major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of GenBank database are available by FTP.

  3. d

    Reference sequence database for eDNA metabarcoding of San Francisco estuary...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raman Nagarajan; Ann Holmes; Andrea Schreier (2023). Reference sequence database for eDNA metabarcoding of San Francisco estuary fishes and invertebrates [Dataset]. http://doi.org/10.5061/dryad.0p2ngf25z
    Explore at:
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Raman Nagarajan; Ann Holmes; Andrea Schreier
    Time period covered
    Jan 1, 2023
    Description

    Environmental DNA (eDNA) methods complement traditional monitoring and can be configured to detect multiple species simultaneously. One such approach, eDNA metabarcoding, uses high-throughput DNA sequencing to indirectly detect many different organisms, spanning broad taxonomic boundaries, from water samples. We are optimizing a non-invasive, low cost eDNA metabarcoding protocol to be used in conjunction with existing monitoring programs. One resource that is currently lacking for metabarcoding studies in general, including those in the San Francisco Estuary (SFE), is a comprehensive database of DNA barcode reference sequences. Without this foundational data, many species go undetected or misidentified in metabarcoding studies. To meet this need, we generated a custom barcode sequence database for the SFE by DNA sequencing and mining of public DNA seqeunce data for estuarine and freshwater species of interest to monitoring programs and ecological studies. Here we present custom referenc...

  4. n

    NCBI Protein Database

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Aug 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). NCBI Protein Database [Dataset]. http://identifiers.org/RRID:SCR_003257
    Explore at:
    Dataset updated
    Aug 31, 2024
    Description

    Databases of protein sequences and 3D structures of proteins. Collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB.

  5. The tpm metabarcoding DNA sequence database for taxonomic allocations using...

    • zenodo.org
    Updated Jul 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    POZZI Adrien C.M.; POZZI Adrien C.M.; BOUCHALI Rayan; MARJOLET Laurence; MARJOLET Laurence; COURNOYER Benoît; COURNOYER Benoît; BOUCHALI Rayan (2024). The tpm metabarcoding DNA sequence database for taxonomic allocations using the Mothur and DADA2 bio-informatic tools (Version 2.0.0) [Dataset]. http://doi.org/10.5281/zenodo.4492211
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    POZZI Adrien C.M.; POZZI Adrien C.M.; BOUCHALI Rayan; MARJOLET Laurence; MARJOLET Laurence; COURNOYER Benoît; COURNOYER Benoît; BOUCHALI Rayan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The tpm metabarcoding DNA sequence database for taxonomic allocations using the Mothur and DADA2 bio-informatic tools (Version 2.0.0)

    A.C.M. Pozzi1, R. Bouchali1, L. Marjolet1, B. Cournoyer1

    1 University of Lyon, UMR Ecologie Microbienne Lyon (LEM), CNRS 5557, INRAE 1418, Université Claude Bernard Lyon 1, VetAgro Sup, Research Team “Bacterial Opportunistic Pathogens and Environment” (BPOE), 69280 Marcy L’Etoile, France.

    Corresponding authors:

    • A.C.M. Pozzi, UMR Microbial Ecology, CNRS 5557, CNRS 1418, VetAgro Sup, Main building, aisle 3, 1st floor, 69280 Marcy-L’Etoile, France. Tel. (+33) 478 87 39 47. Fax. (+33) 472 43 12 23. Email: adrien.meynier_pozzi@vetagro-sup.fr
    • B. Cournoyer, UMR Microbial Ecology, CNRS 5557, CNRS 1418, VetAgro Sup, Main building, aisle 3, 1st floor, 69280 Marcy-L’Etoile, France. Tel. (+33) 478 87 56 47. Fax. (+33) 472 43 12 23. Email: benoit.cournoyer@vetagro-sup.fr

    Keywords:

    BACtpm, Bacteria, tpm, thiopurine-S-methyltransferase EC:2.1.1.67, Nucleotide sequences, PCR products, Next-Generation-Sequencing, OTHU

    Description:

    • The tpm gene codes for the thiopurine-S-methyltransferase (TPMT), an enzyme that can detoxify metalloid-containing oxyanions and xenobiotics (Cournoyer et al., 1998). Bacterial TPMTs radiated apart from human and animal TPMTs, and showed a vertical evolution in line with the 16S rRNA gene molecular phylogeny (Favre‐Bonté et al., 2005).
    • The tpm database, named BACtpm, was designed to apply the tpm-metabarcoding analytical scheme published in Aigle et al. (2021). It includes the full tpm identifiers, GenBank accession numbers, complete taxonomic records (domain down to strain code) of about 215 nucleotide-long tpm sequences of 840 unique taxa belonging to 139 genera.
    • Nucleotide sequences of tpm (range: 190-233 nucleotides) were either retrieved from public repositories (GenBank) or made available by B. Cournoyer’s research group. Colin et al. (2020) described the PCR and high throughput Illumina Miseq DNA sequencing procedures used to produce tpm sequences.
    • BACtpm v.2.0.0 (June 2021 release) is made available under the Creative Commons Attribution 4.0 International Licence. It can be used for the taxonomic allocations of tpm sequences down to the species and strain levels. Data is stored in the csv format enabling future user to reformat it to fit their specific needs.

    Acknowledgments:

    We thank the worldwide community of microbiologists who made contributions to public databases in the past decades, and made possible the elaboration of the BACtpm database. We also thank the Field Observatory in Urban Hydrology (OTHU, www.graie.org/othu/), Labex IMU (Intelligence des Mondes Urbains), the Greater Lyon Urban Community, the School of Integrated Watershed Sciences H2O'LYON, and the Lyon Urban School for their support in the development of this database. This work was funded by the French national research program for environmental and occupational health of ANSES under the terms of project “Iouqmer” EST 2016/1/120, l'Agence Nationale de la Recherche through ANR-16-CE32-0006, ANR-17-CE04-0010, ANR-17-EURE-0018 and ANR-17-CONV-0004, by the MITI CNRS project named Urbamic, and the French water agency for the Rhône, Mediterranean and Corsica areas through the Desir and DOmic projects.

    Cite as:

    A.C.M. Pozzi, R. Bouchali, L. Marjolet, B. Cournoyer The tpm metabarcoding DNA sequence database for taxonomic allocations using the Mothur and DADA2 bio-informatic tools (Version 2.0.0), 2021, https://zenodo.org/, BACtpm v2.0.0, doi: 10.5281/zenodo.4492211

    References:

    Aigle, A., Colin, Y., Bouchali, R., Bourgeois, E., Marti, R., Ribun, S., Marjolet, L., Pozzi, A.C.M., Misery, B., Colinon, C., Bernardin-Souibgui, C., Wiest, L., Blaha, D., Galia, W., Cournoyer, B., 2021. Spatio-temporal variations in chemical pollutants found among urban deposits match changes in thiopurine S-methyltransferase-harboring bacteria tracked by the tpm metabarcoding approach. Sci. Total Environ. 767, 145425. https://doi.org/10.1016/j.scitotenv.2021.145425

    Colin, Y., Bouchali, R., Marjolet, L., Marti, R., Vautrin, F., Voisin, J., Bourgeois, E., Rodriguez-Nava, V., Blaha, D., Winiarski, T., Mermillod-Blondin, F., Cournoyer, B., 2020. Coalescence of bacterial groups originating from urban runoffs and artificial infiltration systems among aquifer microbiomes. Hydrol. Earth Syst. Sci. 24, 4257–4273. https://doi.org/10.5194/hess-24-4257-2020

    Cournoyer, B., Watanabe, S., Vivian, A., 1998. A tellurite-resistance genetic determinant from phytopathogenic pseudomonads encodes a thiopurine methyltransferase: evidence of a widely-conserved family of methyltransferases1The International Collaboration (IC) accession number of the DNA sequence is L49178.1. Biochim. Biophys. Acta BBA - Gene Struct. Expr. 1397, 161–168. https://doi.org/10.1016/S0167-4781(98)00020-7

    Favre‐Bonté, S., Ranjard, L., Colinon, C., Prigent‐Combaret, C., Nazaret, S., Cournoyer, B., 2005. Freshwater selenium-methylating bacterial thiopurine methyltransferases: diversity and molecular phylogeny. Environ. Microbiol. 7, 153–164. https://doi.org/10.1111/j.1462-2920.2004.00670.x

  6. b

    Nucleotide Sequence Database

    • bioregistry.io
    • identifiers.org
    Updated Apr 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Nucleotide Sequence Database [Dataset]. https://bioregistry.io/insdc
    Explore at:
    Dataset updated
    Apr 9, 2022
    Description

    The International Nucleotide Sequence Database Collaboration (INSDC) consists of a joint effort to collect and disseminate databases containing DNA and RNA sequences.

  7. f

    EMBL2checklists: A Python package to facilitate the user-friendly submission...

    • plos.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Gruenstaeudl; Yannick Hartmaring (2023). EMBL2checklists: A Python package to facilitate the user-friendly submission of plant and fungal DNA barcoding sequences to ENA [Dataset]. http://doi.org/10.1371/journal.pone.0210347
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Michael Gruenstaeudl; Yannick Hartmaring
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe submission of DNA sequences to public sequence databases is an essential, but insufficiently automated step in the process of generating and disseminating novel DNA sequence data. Despite the centrality of database submissions to biological research, the range of available software tools that facilitate the preparation of sequence data for database submissions is low, especially for sequences generated via plant and fungal DNA barcoding. Current submission procedures can be complex and prohibitively time expensive for any but a small number of input sequences. A user-friendly software tool is needed that streamlines the file preparation for database submissions of DNA sequences that are commonly generated in plant and fungal DNA barcoding.MethodsA Python package was developed that converts DNA sequences from the common EMBL and GenBank flat file formats to submission-ready, tab-delimited spreadsheets (so-called ‘checklists’) for a subsequent upload to the annotated sequence section of the European Nucleotide Archive (ENA). The software tool, titled ‘EMBL2checklists’, automatically converts DNA sequences, their annotation features, and associated metadata into the idiosyncratic format of marker-specific ENA checklists and, thus, generates files that can be uploaded via the interactive Webin submission system of ENA.ResultsEMBL2checklists provides a simple, platform-independent tool that automates the conversion of common DNA barcoding sequences into easily editable spreadsheets that require no further processing but their upload to ENA via the interactive Webin submission system. The software is equipped with an intuitive graphical as well as an efficient command-line interface for its operation. The utility of the software is illustrated by its application in four recent investigations, including plant phylogenetic and fungal metagenomic studies.DiscussionEMBL2checklists bridges the gap between common software suites for DNA sequence assembly and annotation and the interactive data submission process of ENA. It represents an easy-to-use solution for plant and fungal biologists without bioinformatics expertise to generate submission-ready checklists from common DNA sequence data. It allows the post-processing of checklists as well as work-sharing during the submission process and solves a critical bottleneck in the effort to increase participation in public data sharing.

  8. f

    Data_Sheet_2_Cross-sectional use of barcode of life data system and GenBank...

    • figshare.com
    pdf
    Updated Jun 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Takeru Nakazato; Utsugi Jinbo (2023). Data_Sheet_2_Cross-sectional use of barcode of life data system and GenBank as DNA barcoding databases for the advancement of museomics.PDF [Dataset]. http://doi.org/10.3389/fevo.2022.966605.s002
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    Frontiers
    Authors
    Takeru Nakazato; Utsugi Jinbo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Museomics is an approach to the DNA sequencing of museum specimens that can generate both biodiversity and sequence information. In this study, we surveyed both the biodiversity information-based database BOLD (Barcode of Life System) and the sequence information database GenBank, by using DNA barcoding data as an example, with the aim of integrating the data from these two databases. DNA barcoding is a method of identifying species from DNA sequences by using short genetic markers. We surveyed how many entries had biodiversity information (such as links to BOLD and specimen IDs) by downloading all fish, insect, and flowering plant data available from the GenBank Nucleotide, and BOLD ID was assigned to 26.2% of entries for insects. In the same way, we downloaded the respective BOLD data and checked the status of links to sequence information. We also investigated how many species do these databases cover, and 7,693 species were found to exist only in BOLD. In the future, as museomics develops as a field, the targeted sequences will be extended not only to DNA barcodes, but also to mitochondrial genomes, other genes, and genome sequences. Consequently, the value of the sequence data will increase. In addition, various species will be sequenced and, thus, biodiversity information such as the evidence specimen photographs used as a basis for species identification, will become even more indispensable. This study contributes to the acceleration of museomics-associated research by using databases in a cross-sectional manner.

  9. Z

    The hsp65 metabarcoding DNA sequence database for taxonomic allocations...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emmanuelle BERGERON (2021). The hsp65 metabarcoding DNA sequence database for taxonomic allocations using the Mothur (Version 1.0.0) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5576073
    Explore at:
    Dataset updated
    Oct 26, 2021
    Dataset provided by
    Florian VAUTRIN
    Veronica RODRIGUEZ-NAVA
    Emmanuelle BERGERON
    Andrea Faitova
    Delphine MOUNIEE
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The hsp65 gene codes for an Heat Shock Protein (Telenti et al., 1993) and is widespread in the Actinobacteria phylum. It is well suited for the species allocation of the Nocardia genus (Rodriguez-Nava et al., 2006).

    The hsp65 database, named ACTIhsp65, was designed to apply the hsp65-metabarcoding analytical scheme published in Vautrin et al. (2021). It includes the full hsp65 identifiers, GenBank accession numbers, complete taxonomic records (domain down to strain code) of about 401 nucleotide-long hsp65 sequences of 1066 unique taxa belonging to 198 genera.

    Nucleotide sequences of hsp65 (range: 165-565 nucleotides) were either retrieved from public repositories (GenBank) or made available by Veronica Rodriguez-Nava.Vautrin et al. (2021) described the PCR and high throughput Illumina Miseq DNA sequencing procedures used to produce hsp65 sequences.

    ACTIhsp65 V1.0.0 (June 2018 release) is made available under the Creative Commons Attribution 4.0 International Licence. It can be used for the taxonomic allocations of hsp65 sequences down to the species.

  10. n

    GenBank Database

    • cmr.earthdata.nasa.gov
    Updated Apr 20, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). GenBank Database [Dataset]. https://cmr.earthdata.nasa.gov/search/concepts/C1214138025-SCIOPS.html
    Explore at:
    Dataset updated
    Apr 20, 2017
    Time period covered
    Jan 1, 1970 - Present
    Area covered
    Description

    GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. GenBank (at NCBI), together with the DNA DataBank of Japan (DDBJ) and the European Molecular Biology Laboratory (EMBL) comprise the International Nucleotide Sequence Database Collaboration. These three organizations exchange data on a daily basis.

    GenBank grows at an exponential rate, with the number of nucleotide bases doubling approximately every 14 months. Currently, GenBank contains more than 13 billion bases from over 100,000 species.

  11. r

    High Throughput Genomic Sequences Division

    • rrid.site
    • scicrunch.org
    • +1more
    Updated Sep 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). High Throughput Genomic Sequences Division [Dataset]. http://identifiers.org/RRID:SCR_002150
    Explore at:
    Dataset updated
    Sep 18, 2025
    Description

    Database of high-throughput genome sequences from large-scale genome sequencing centers, including unfinished and finished sequences. It was created to accommodate a growing need to make unfinished genomic sequence data rapidly available to the scientific community in a coordinated effort among the International Nucleotide Sequence databases, DDBJ, EMBL, and GenBank. Sequences are prepared for submission by using NCBI's software tools Sequin or tbl2asn. Each center has an FTP directory into which new or updated sequence files are placed. Sequence data in this division are available for BLAST homology searches against either the htgs database or the month database, which includes all new submissions for the prior month. Unfinished HTG sequences containing contigs greater than 2 kb are assigned an accession number and deposited in the HTG division. A typical HTG record might consist of all the first-pass sequence data generated from a single cosmid, BAC, YAC, or P1 clone, which together make up more than 2 kb and contain one or more gaps. A single accession number is assigned to this collection of sequences, and each record includes a clear indication of the status (phase 1 or 2) plus a prominent warning that the sequence data are unfinished and may contain errors. The accession number does not change as sequence records are updated; only the most recent version of a HTG record remains in GenBank.

  12. n

    NEON (National Ecological Observatory Network) Fish sequences DNA barcode...

    • data.neonscience.org
    zip
    Updated Sep 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). NEON (National Ecological Observatory Network) Fish sequences DNA barcode (DP1.20105.001) [Dataset]. https://data.neonscience.org/data-products/DP1.20105.001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 16, 2025
    License

    https://www.neonscience.org/data-samples/data-policies-citationhttps://www.neonscience.org/data-samples/data-policies-citation

    Time period covered
    Nov 2017 - Dec 2023
    Area covered
    HOPB, WLOU, CUPE, LECO, GUIL, BLDE, POSE, SYCA, TECR, LIRO
    Description

    COI DNA sequences from select fish in lakes and wadeable streams

  13. ZooGene A DNA Sequence Database for Calanoid Copepods and Euphausiids

    • search.dataone.org
    • obis.org
    • +1more
    Updated Sep 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Intergovernmental Oceanographic Commission of UNESCO (2025). ZooGene A DNA Sequence Database for Calanoid Copepods and Euphausiids [Dataset]. https://search.dataone.org/view/sha256%3A0ee95b09e809f1f435bf9f3b50cc8fa9b7ae40b6179e740acb00d6b419ae8076
    Explore at:
    Dataset updated
    Sep 16, 2025
    Dataset provided by
    Ocean Biodiversity Information Systemhttp://www.obis.org/
    Authors
    Intergovernmental Oceanographic Commission of UNESCO
    Time period covered
    Jan 1, 1989 - Jan 1, 2001
    Area covered
    Description

    Zooplankton genomic (ZooGene) database of DNA type sequences for calanoid copepods and euphausiids.

  14. f

    Overview of the bioinformatic steps involved in submitting novel DNA...

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Gruenstaeudl; Yannick Hartmaring (2023). Overview of the bioinformatic steps involved in submitting novel DNA sequence data to ENA when using EMBL2checklists, starting from assembled DNA sequences. [Dataset]. http://doi.org/10.1371/journal.pone.0210347.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Michael Gruenstaeudl; Yannick Hartmaring
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview of the bioinformatic steps involved in submitting novel DNA sequence data to ENA when using EMBL2checklists, starting from assembled DNA sequences.

  15. r

    T4-like genome database

    • rrid.site
    • scicrunch.org
    • +1more
    Updated Aug 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). T4-like genome database [Dataset]. http://identifiers.org/RRID:SCR_005367
    Explore at:
    Dataset updated
    Aug 13, 2025
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented August 22, 2016. A database of information on bacterial phages. It contains multiple phage genomes, which users can BLAST and MegaBLAST, and also hosts a Phage Forum in which users can discuss phage data. Interactive browsing of completed phage genomes is available using the program. The browser allows users to scan the genome for particular features and to download sequence information plus analyses of those features. Views of the genome are generated showing named genes BLAST similarities to other phages predicted tRNAs and other sequence features.

  16. Accepted species list of Eurotiales, including a DNA sequence reference...

    • zenodo.org
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cobus M Visagie; Cobus M Visagie; David Overy; David Overy; Jos Houbraken; Jos Houbraken; František Sklenář; František Sklenář; Bensch Konstanze; Jens Frisvad; Jens Frisvad; Jonathan Mack; Giancarlo Perrone; Giancarlo Perrone; Robert A. Samson; Robert A. Samson; Nicole Van Vuuren; Neriman Yilmaz; Neriman Yilmaz; Vit Hubka; Vit Hubka; Bensch Konstanze; Jonathan Mack; Nicole Van Vuuren (2025). Accepted species list of Eurotiales, including a DNA sequence reference database, as curated by the International Commission of Penicillium and Aspergillus (ICPA) [Dataset]. http://doi.org/10.5281/zenodo.16607355
    Explore at:
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Cobus M Visagie; Cobus M Visagie; David Overy; David Overy; Jos Houbraken; Jos Houbraken; František Sklenář; František Sklenář; Bensch Konstanze; Jens Frisvad; Jens Frisvad; Jonathan Mack; Giancarlo Perrone; Giancarlo Perrone; Robert A. Samson; Robert A. Samson; Nicole Van Vuuren; Neriman Yilmaz; Neriman Yilmaz; Vit Hubka; Vit Hubka; Bensch Konstanze; Jonathan Mack; Nicole Van Vuuren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Eurotiales is a diverse and speciose order and includes economically important genera like Aspergillus, Penicillium, Paecilomyces and Talaromyces. Historically, species identifications based on morphology are challenging. The publication of accepted species lists and the availability of representative DNA sequences for type strains have contributed greatly towards accurate species identification and facilitated the description of many new species. However, despite current advancements, a proportion of newly described species within these taxonomically challenging genera represent, in fact, existing species, which raises obvious concerns.

    This study thus aimed to further modernise the taxonomy of Eurotiales by addressing key challenges in species identification and classification. Our study objectives were threefold: to review species described after 2023, update the accepted species list, and release a curated DNA sequence dataset to facilitate future species identifications. We conclude that a move to a phylogenetic species concept is necessary but continue to support the inclusion of morphological descriptions and, where possible, associated secondary metabolite, exoenzyme, physiology and ecological data when introducing new species.

    Our list now contains 1393 species classified into four families and 26 genera, with Aspergillus (n=465), Penicillium (n=598) and Talaromyces (n=236) containing the most species. To aid sequence-based identifications and species descriptions under a phylogenetic species concept, we release a curated DNA reference sequence database containing 18837 DNA sequences (3867 ITS, 5277 BenA, 5110 CaM and 4583 RPB2) generated from 5325 strains. Sequences were selected to best cover the infraspecies variation under our current understanding of each species. The sequence database will be kept up to date as new information becomes available. This manuscript presents a major leap towards our goal to facilitate work with Eurotiales, while providing the taxonomic framework to support research excellence related to this important fungal group.

    This dataset is curated and kept up to date by the International Commission of Penicillium and Aspergillus (ICPA). If you have questions or suggestions, please get in contact with ICPA members.

  17. n

    DNA DataBank of Japan (DDBJ)

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Mar 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). DNA DataBank of Japan (DDBJ) [Dataset]. http://identifiers.org/RRID:SCR_002359
    Explore at:
    Dataset updated
    Mar 24, 2025
    Description

    Maintains and provides archival, retrieval and analytical resources for biological information. Central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: DDBJ Omics Archive and BioProject. DOR is archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides organizational framework to access metadata about research projects and data from projects that are deposited into different databases.

  18. UNITE - Unified system for the DNA based fungal species linked to the...

    • demo.gbif.org
    • gbif.org
    • +2more
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PlutoF (2024). UNITE - Unified system for the DNA based fungal species linked to the classification [Dataset]. http://doi.org/10.15468/mkpcy3
    Explore at:
    Dataset updated
    Sep 30, 2024
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    PlutoF
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    UNITE is a rDNA sequence database designed to provide a stable and reliable platform for sequence-borne identification of all fungal species. UNITE provides a unified way for delimiting, identifying, communicating, and working with DNA-based Species Hypotheses (SH). All fungal ITS sequences in the International Nucleotide Sequence Databases (INSD: GenBank, ENA, DDBJ) are clustered to approximately the species level by applying a set of dynamic distance values (0.5 - 3.0%). All species hypotheses are given a unique, stable name in the form of a DOI, and their taxonomic and ecological annotations are verified through distributed, web-based third-party annotation efforts. SHs are connected to a taxon name and its classification as far as possible (phylum, class, order, etc.) by taking into account identifications for all sequences in the SH. An automatically or manually designated sequence is chosen to represent each such SH. These sequences are released (https://unite.ut.ee/repository.php) for use by the scientific community in, for example, local sequence similarity searches and next-generation sequencing analysis pipelines. The system and the data are updated automatically as the number of public fungal ITS sequences grows.

  19. n

    BOLD

    • neuinfo.org
    • rrid.site
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). BOLD [Dataset]. http://doi.org/10.17616/R3PP7J
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    DNA barcode data with an online workbench that supports data validation, annotation, and publication for specimen, distributional, and molecular data. The data platform consists of three main modules, a data portal, a database of barcode clusters, and data collection workbench. The Public Data Portal provides access to all public barcode data which consists of data generated using the Workbench module as well as data mined from other sources. The Barcode Index Number (BIN) system assigns a unique identifier to each sequence cluster of COI, providing an interim taxonomic system for species in the animal kingdom. The workbench module integrates secure databases with analytical tools to provide a private collaborative environment for researchers to collect, analyze, and publish barcode data and ancillary DNA sequences. This platform also provides an annotation framework that supports tagging and commenting on records and their components (i.e. taxonomy, images, and sequences), allowing for community-based validation of barcode data. By providing specialized services, it aids in the assembly of records that meet the standards needed to gain BARCODE designation in the global sequence databases. Because of its web-based delivery and flexible data security model, it is also well positioned to support projects that involve broad research alliances. Public data records include record identifiers, taxonomy, specimen details, collection information and sequence data. Data that has been publicly released through BOLD can be retrieved manually through the BOLD public interface or automatically through BOLD web services. BOLD analytical tools are available for any data set that exists in BOLD (including publicly available data). Analytical tools can be accessed through the BOLD Project Console under the headings Sequences Analysis or Specimen Aggregates. Some examples include Taxon ID Tree, Alignment Viewer, Distribution Maps, and Image Library.

  20. n

    Pseudomonas Genome Database

    • neuinfo.org
    • dknet.org
    • +1more
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Pseudomonas Genome Database [Dataset]. http://identifiers.org/RRID:SCR_006590
    Explore at:
    Dataset updated
    Jul 8, 2025
    Description

    Database of peer-reviewed, continually updated annotation for the Pseudomonas aeruginosa PAO1 reference strain genome expanded to include all Pseudomonas species to facilitate cross-strain and cross-species genome comparisons with high quality comparative genomics. The database contains robust assessment of orthologs, a novel ortholog clustering method, and incorporates five views of the data at the sequence and annotation levels (Gbrowse, Mauve and custom views) to facilitate genome comparisons. Other features include more accurate protein subcellular localization predictions and a user-friendly, Boolean searchable log file of updates for the reference strain PAO1. The current annotation is updated using recent research literature and peer-reviewed submissions by a worldwide community of PseudoCAP (Pseudomonas aeruginosa Community Annotation Project) participating researchers. If you are interested in participating, you are invited to get involved. Many annotations, DNA sequences, Orthologs, Intergenic DNA, and Protein sequences are available for download.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
National Library of Medicine (2025). GenBank [Dataset]. https://catalog.data.gov/dataset/genbank-14853

GenBank

Explore at:
Dataset updated
Jul 17, 2025
Dataset provided by
National Library of Medicine
Description

NIH Genetic sequence database; an annotated collection of all publicly available DNA sequences.

Search
Clear search
Close search
Google apps
Main menu