Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains 438 records of Trichoptera species from 22 of the 23 families known from the Iberian Peninsula. Specimens were collected between 1975 to 2018 in Portugal, Spain and France (Paupério et al., 2023). Specimens have been identified to species or subspecies level, in a total of 141 species representing 37% of the Caddisflies known for the Iberian Peninsula. Specimens were captured during fieldwork directed specifically for the sampling of Trichoptera using different methodologies and stored in 96% ethanol. A tissue sample, usually a leg, was collected from each individual, from which DNA was extracted. Sequencing of the 658 bp COI DNA barcode was conducted within the InBIO Barcoding Initiative (IBI) and all DNA sequences were submitted to BOLD (Barcode of Life Data System) and GenBank databases. Specimens are deposited in the IBI collection at the CIBIO (Research Center in Biodiversity and Genetic Resources, Portugal) or in the collection Marcos A. González at the University of Santiago de Compostela (Spain). All DNA extracts are deposited in the IBI collection.
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Diatoms (Bacillariophyta) are ubiquitous microalgae which produce a siliceous exoskeleton and which make a major contribution to the productivity of oceans and freshwaters. They display a huge diversity, which makes them excellent ecological indicators of aquatic ecosystems, and can also be used to reconstruct paleoenvironments. Usually, diatoms are identified using characteristics of their exoskeleton morphology, which can be time consuming and error-prone. DNA-barcoding is an alternative to this and the use of High-Throughput-Sequencing enables the rapid analysis of many environmental samples at a lower cost than if specialist analysts are used. However, to identify environmental sequences correctly, an expertly curated reference library is needed. Several curated libraries for protists exists; none, however, are dedicated to diatoms. Diat.barcode is an open-access library dedicated to diatoms which has been maintained since 2012. It was initiated with the barcoding network of INRA (French National Institute for Agricultural Research) R-Syst, is now an international initiative partly supported by a Cost network (DNAqua-net). Data come from two sources (1) the NCBI nucleotide database (National Center for Biotechnology Information) and (2) unpublished sequencing data of culture collections in France, UK and Russia. Since 2017, several European experts have collaborated to curate this library for rbcL, a chloroplast marker suitable for species-level identification of diatoms. For the latests versions of the database, more than 8100 curated barcodes are available. The database is accessible through https://www6.inra.fr/carrtel-collection_eng/Barcoding-database. A ready-to-use subset of the database for metabarcoding analyses is also accessible.
https://www.neonscience.org/data-samples/data-policies-citationhttps://www.neonscience.org/data-samples/data-policies-citation
COI DNA sequences from select fish in lakes and wadeable streams
The dataset contains 412 records of Diptera species from the families Limoniidae, Pediciidae and Tipulidae, collected between 2003 to 2019 in Portugal, including the Azores and Madeira archipelagos (Ferreira et al., 2021; Oosterbroek et al., 2020; Starý, 2014). Specimens have been identified to species or subspecies level, in a total of 83 species representing 58% of the Craneflies known for Portugal. Specimens were captured during fieldwork directed specifically for the sampling of Diptera using different methodologies and the majority was stored in 96% ethanol. A tissue sample, usually a leg, was collected from each individual, from which DNA was extracted. The DNA barcoding of these specimens was conducted within the InBIO Barcoding Initiative (IBI), funded by the EnvMetaGen and PORBIOTA projects. The dataset “DNA barcodes of Portuguese Diptera 02 - Limoniidae, Pediciidae and Tipulidae” is part of a group of Diptera datasets published by IBI, the first of which, “The InBIO Barcoding Initiative Database: Diptera 01” (https://doi.org/10.15468/q1bvt3) has already been made available through GBIF. DNA barcode sequences were deposited in BOLD (Barcode of Life Data System) and GenBank databases. All DNA extracts are deposited in the IBI collection at the CIBIO (Research Center in Biodiversity and Genetic Resources).
The dataset contains 203 records of Diptera species collected from 2014 to 2018 in continental Portugal (Ferreira et al., 2020). The species represented in the dataset, 154 in total, correspond to about 10% of the known fly diversity of continental Portugal, and contribute to the knowledge on the DNA barcodes and distribution of Portuguese Diptera. Specimens were captured during fieldwork directed specifically for the sampling of Diptera using different methodologies and stored in 96% ethanol. All specimens were morphologically identified to species level. A tissue sample, usually a leg, was collected from each individual, from which DNA was extracted. The DNA barcoding of these specimens was conducted within the InBIO Barcoding Initiative (IBI), funded by EnvMetaGen and PORBIOTA projects. DNA barcode sequences were deposited in BOLD (Barcode of Life Data System) online database. Preserved specimens and DNA extracts are deposited in the IBI collection at the CIBIO (Research Center in Biodiversity and Genetic Resources).
The dataset contains 71 records of Plecoptera specimens collected from 2004 to 2018 in the Iberian Peninsula (Ferreira et al., 2020). Twenty-nine stoneflies species are represented in the dataset, contributing to the knowledge on the DNA barcodes and distribution of the Plecoptera in Iberia. Specimens were captured during fieldwork directed specifically for the sampling of Plecoptera using different methodologies and stored in 96% ethanol. All specimens were morphologically identified to species level. A tissue sample, usually a leg, was collected from each individual, from which DNA was extracted. The DNA barcoding of these specimens was conducted within the InBIO Barcoding Initiative (IBI), funded by EnvMetaGen and PORBIOTA projects. DNA barcode sequences were deposited in BOLD (Barcode of Life Data System) online database. Preserved specimens and DNA extracts are deposited in the IBI collection at the CIBIO (Research Center in Biodiversity and Genetic Resources).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Building DNA barcode databases for plants has historically been ad hoc, and often with a relatively narrow taxonomic focus. To realize the full potential of DNA barcoding for plants, and particularly its application to metabarcoding for mixed-species environmental samples, systematic sequencing of reference collections is required using an augmented set of DNA barcode loci, applied according to agreed data generation and analysis standards. The largest and most complete reference collections of plants are held in herbaria. Australia has a globally significant flora that is well sampled and expertly curated by its herbaria, coordinated through the Council of Heads of Australasian Herbaria. There exists a tremendous opportunity to provide a comprehensive and taxonomically robust reference database for plant DNA barcoding applications by undertaking coordinated and systematic sequencing of the entire flora of Australia utilizing existing herbarium material. In this paper, we review the development of DNA barcoding and metabarcoding and consider the requirements for a robust and comprehensive system. We analyzed the current availability of DNA barcode reference data for Australian plants, recommend priority taxa for database inclusion, and highlight future applications of a comprehensive metabarcoding system. We urge that large-scale and coordinated analysis of herbarium collections be undertaken to realize the promise of DNA barcoding and metabarcoding, and propose that the generation and curation of reference data should become a national investment priority.
https://www.neonscience.org/data-samples/data-policies-citationhttps://www.neonscience.org/data-samples/data-policies-citation
COI DNA sequences from select mosquitoes
The capacity to identify an unknown organism using the DNA sequence from a single gene has many applications. These include the development of biodiversity inventories (Janzen et al. 2005), forensics (Meiklejohn et al. 2011), biosecurity (Armstrong and Ball 2005), and the identification of cryptic species (Smith et al. 2006). The popularity and widespread use (Teletchea 2010) of the DNA barcoding approach (Hebert et al. 2003), despite broad misgivings (e.g., Smith 2005; Will et al. 2005; Rubinoff et al. 2006), attest to this. However, one major shortcoming to the standard barcoding approach is that it assumes that gene trees and species trees are synonymous, an assumption that is known not to hold in many cases (Pamilo and Nei 1988; Funk and Omland 2003). Biological processes that violate this assumption include incomplete lineage sorting and interspecific hybridization (Funk and Omland 2003). Indeed, simulation studies indicate that the concatenation approach (in which these two proces...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The DNA barcoding results consisting of sequences for 269 specimens were used for species identification and species richness assessment through the comparison of our data with the data available in BOLD and NCBI databases, and/or by applying species delimitation methods. The files provided are alignment used for species delimiatation of Collembola and Oligochaeta specimens. Methods applied are the following: Sequencing of purified DNA amplicons was performed by Macrogen Inc. (Amsterdam, Netherlands) using the same amplification primer (LCO1490). Sequences were edited using BIOEDIT v.7.2. (Gene Codes Corporation, Ann Arbor, MI USA) [Hall 1999]. Chromatograms were manually checked for ambiguous nucleotides, stop codons and indels in BIOEDIT v.7.2. Sequences of specimens, including sequences from BOLD, were grouped and aligned in MEGA X [Kumar et al. 2018] using MUSCLE [Edgar 2004]. After, alignment sequences were collapsed to unique haplotypes (haploid genotype) using FaBox (1.5) [Villesen 2007]. The DNA sequence alignments were checked for stop codons using Mesquite ver. 3.5 [Maddison & Maddison 2019].
This data product contains the quality-controlled laboratory metadata and QA results for NEON's cytochrome oxidase I (COI) barcoding of fish sequences. Fin clips are taken from a subset of collected fish for DNA analysis. The DNA barcoding procedure involves the removal of tissue, extracting and sequencing DNA from the tissue, and matching that sequence data to sequences from previously identified voucher specimens. DNA analysis serves a number of purposes, including verification of taxonomy of specimens that do not receive expert identification, clarification of the taxonomy of rare or cryptic species, and characterization of diversity using molecular markers. For additional details, see the user guide, protocols, and science design listed in the Documentation section in this data product's details webpage. Queries for this data product will return metadata tables formatted for submission to the Barcode of Life Database. These queries will also provide links to the actual sequence data, which are publicly available on the Barcode of Life Datasystem (BOLD, http://www.barcodinglife.com/). The sequence data can be obtained by following the links from the NEON data portal, or by directly querying NEON data sets on the BOLD server. From the NEON portal, the link "BOLD Project: Fish sequences DNA barcode" redirects to a page on the BOLD public data portal for the queried data. This is a dynamic link and will automatically update based on the user query. Latency: The expected time from data and/or sample collection in the field to data publication is as follows, for each of the data tables (in days) in the downloaded data package. See the Data Product User Guide for more information. fsh_BOLDcollectionData: 390 fsh_BOLDspecimenDetails: 390 fsh_BOLDtaxonomy: 390 fsh_BOLDvoucherInfo: 390 Fin clips will be collected from 5-10 individuals of a target species. These tissues will be preserved in an appropriate tissue vial and shipped to an external lab. DNA will be extracted and target sequences amplified via PCR. Barcodes of cytochrome oxidase I will be generated per specimen.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Museomics is an approach to the DNA sequencing of museum specimens that can generate both biodiversity and sequence information. In this study, we surveyed both the biodiversity information-based database BOLD (Barcode of Life System) and the sequence information database GenBank, by using DNA barcoding data as an example, with the aim of integrating the data from these two databases. DNA barcoding is a method of identifying species from DNA sequences by using short genetic markers. We surveyed how many entries had biodiversity information (such as links to BOLD and specimen IDs) by downloading all fish, insect, and flowering plant data available from the GenBank Nucleotide, and BOLD ID was assigned to 26.2% of entries for insects. In the same way, we downloaded the respective BOLD data and checked the status of links to sequence information. We also investigated how many species do these databases cover, and 7,693 species were found to exist only in BOLD. In the future, as museomics develops as a field, the targeted sequences will be extended not only to DNA barcodes, but also to mitochondrial genomes, other genes, and genome sequences. Consequently, the value of the sequence data will increase. In addition, various species will be sequenced and, thus, biodiversity information such as the evidence specimen photographs used as a basis for species identification, will become even more indispensable. This study contributes to the acceleration of museomics-associated research by using databases in a cross-sectional manner.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Combined data set derived from new data generated herein and publicly available DNA barcoding projects from the Barcode of Life Database.
The dataset contains 234 records of Lacewings (Neuroptera) species collected from 2006 to 2019 in continental Portugal (Oliveira et al., 2021). Specimens were detected and captured by direct search of the environment and by using both UV and mercury-vapor lamps to attract the insects. Captured specimens were preserved in 96% ethanol. All captured specimens were identified to species level. Samples of each species were selected for DNA sequencing based on their geographic provenance. From each specimen, one leg (tissue sample) was removed to be used for DNA extraction. The DNA barcoding of these specimens was conducted within the InBIO Barcoding Initiative (IBI), funded by EnvMetaGen and PORBIOTA projects. DNA barcode sequences were deposited in BOLD (Barcode of Life Data System) and GenBank databases. All specimens and DNA extracts are deposited in the IBI collection at the CIBIO (Research Center in Biodiversity and Genetic Resources).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the associated data, workflow and additional code to prepare the publication of taxalogue. The results of the study can be inspected and reproduced using this data.
The benchmark.zip folder contains the code, data, figures and workflow for the Case study of the publication
The ref_db_taxalogue.zip contains the code, data, figures and workflow that were used to filter the taxalogue reference database. The file data.tar.gz contains the final taxalogue reference database + statistics and parameters used. The code to produce a reference database is available at https://github.com/nwnoll/taxalogue
{"references": ["Keller et al. (2019) BCdatabaser: on-the-fly reference database creation for (meta-)barcoding. EcoEvoRxiv (2019); https://doi.org/10.32942/osf.io/cmfu2 - https://github.com/molbiodiv/bcdatabaser", "eutils: Sayers E. E-utilities Quick Start. 2008 Dec 12 [Updated 2018 Oct 24]. In: Entrez Programming Utilities Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2010-. Available from: https://www.ncbi.nlm.nih.gov/books/NBK25500/", "NCBI::Taxonomy: F\u00f6rster et al. https://github.com/greatfireball/NCBI-Taxonomy - F\u00f6rster F. greatfireball/NCBI-Taxonomy v0.90. Zenodo. 2018 Oct 15; https://doi.org/10.5281/zenodo.1462861", "SeqFilter: Hackl et al. https://github.com/BioInf-Wuerzburg/SeqFilter", "dispr: Cofield et al. https://github.com/douglasgscofield/dispr", "Krona: Ondov BD, Bergman NH, and Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30; 12(1):385."]} Parameters: --marker-search-string ITS2 OR 'internal transcribed spacer2' --taxonomic-range Viridiplantae --sequence-length-filter 100:2000 --sequences-per-taxon 9 This dataset was automatically created with data from NCBI using the BCdatabaser tool
The dataset contains 135 records of Hemiptera species collected from 2015 to 2019 in continental Portugal (Sousa et al., 2021). The species represented in the dataset, 90 in total, correspond to about 7.5% of the known true bugs diversity of continental Portugal, and contribute to the knowledge on the DNA barcodes and distribution of Portuguese Hemiptera. Specimens were captured during fieldwork by direct search of specimens or by sweeping the vegetation and stored in 96% ethanol. All specimens were identified to species level. A tissue sample, usually a leg, was collected from each individual, from which DNA was extracted. The DNA barcoding of these specimens was conducted within the InBIO Barcoding Initiative (IBI), funded by EnvMetaGen and PORBIOTA projects. DNA barcode sequences were deposited in BOLD (Barcode of Life Data System) online database. Preserved specimens and DNA extracts are deposited in the IBI collection at the CIBIO (Research Center in Biodiversity and Genetic Resources).
https://www.neonscience.org/data-samples/data-policies-citationhttps://www.neonscience.org/data-samples/data-policies-citation
COI DNA sequences from select ground beetles
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Demultiplexed Illumina sequence data (COI) from 450 invertebrate specimens, related to the paper Fast-tracking bespoke DNA reference database generation from museum collections for biomonitoring and conservation by Andrew Dopheide, Talia Brav-Cubitt, Anastasija Podolyan, Richard A. B. Leschen, Darren Ward, Thomas R. Buckley, and Manpreet K. Dhami.
The dataset contains five records of the alderfly genus Sialis Latreille, 1803 (Megaloptera, Sialidae) collected in 2011 and 2015 in northern continental Portugal. The study of these specimens found two species previously unknown in the country, S. lutaria Linnaeus, 1758 and S. nigripes Pictet, 1865 and confirmed the presence of S. fuliginosa Pictet, 1836 in Portugal (Ferreira et al. 2019). In that work the three species were identified morphologically and confirmed with DNA barcodes. Specimens were detected by direct search on vegetation and rocks around river streams and captured with a hand-net. Captured specimens were identified to species level and preserved in 96% ethanol. A tissue sample, a leg, was collected from each individual, from which DNA was extracted. The DNA barcoding of these specimens was conducted within the InBIO Barcoding Initiative (IBI), funded by EnvMetaGen and PORBIOTA projects. DNA barcode sequences were deposited in BOLD (Barcode of Life Data System) and GenBank online databases. DNA extracts are deposited in the IBI collection at the CIBIO (Research Center in Biodiversity and Genetic Resources).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains 438 records of Trichoptera species from 22 of the 23 families known from the Iberian Peninsula. Specimens were collected between 1975 to 2018 in Portugal, Spain and France (Paupério et al., 2023). Specimens have been identified to species or subspecies level, in a total of 141 species representing 37% of the Caddisflies known for the Iberian Peninsula. Specimens were captured during fieldwork directed specifically for the sampling of Trichoptera using different methodologies and stored in 96% ethanol. A tissue sample, usually a leg, was collected from each individual, from which DNA was extracted. Sequencing of the 658 bp COI DNA barcode was conducted within the InBIO Barcoding Initiative (IBI) and all DNA sequences were submitted to BOLD (Barcode of Life Data System) and GenBank databases. Specimens are deposited in the IBI collection at the CIBIO (Research Center in Biodiversity and Genetic Resources, Portugal) or in the collection Marcos A. González at the University of Santiago de Compostela (Spain). All DNA extracts are deposited in the IBI collection.