Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset descriptionThis item contains COI (mitochondrial cytochrome oxidase subunit I) sequences collected from the BOLD database. The fasta file bold_clustered_cleaned.fasta.gz has record ids that can be queried in the Public Data Portal and each fasta header contains the taxonomic ranks + the BIN ID assigned to the record. The taxonomic information for each record is also given in the tab-separated file bold_info_filtered.tsv.gz.The file bold_clustered.sintax.fasta.gz is directly compatible with the SINTAX algorithm in vsearch while files bold_clustered.assignTaxonomy.fasta.gz and bold_clustered.addSpecies.fasta.gz are directly compatible with the assignTaxonomy and addSpecies functions from DADA2, respectively. The dataset was last created on December 16, 2022NOTE: We have noticed that the gzipped files in this upload have been compressed twice for some reason. A quick fix is to unzip any file with a ".gz" extension, then rename the unzipped file by adding the ".gz" extension back. Then running the unzipping once again. Sorry for the inconvenience.MethodsThe code used to generate this dataset consists of a snakemake workflow wrapped into a python package that can be installed with conda (conda install -c bioconda coidb). Firstly sequence and taxonomic information for records in the BOLD database is downloaded from the GBIF Hosted Datasets. This data is then filtered to only keep records annotated as 'COI-5P' and assigned to a BIN ID. The taxonomic information is parsed in order to assign species names and resolve higher level ranks for each BIN ID. Sequences are processed to remove gap characters and leading and trailing Ns. After this, any sequences with remaining non-standard characters are removed. Sequences are then clustered at 100% identity using vsearch (Rognes et al. 2016). This clustering is done separately for sequences assigned to each BIN ID.For more information, see https://github.com/biodiversitydata-se/coidb
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COInr is a non-redundant, comprehensive database of COI sequences extracted from NCBI-nt and BOLD. It is not limited to a taxon, a gene region, or a taxonomic resolution. Sequences are dereplicated between databases and within taxa.
Each taxon has a unique taxonomic Identifier (taxID), fundamental to avoid ambiguous associations of homonyms and synonyms in the source database. TaxIDs form a coherent hierarchical system fully compatible with the NCBI taxIDs allowing creating their full or ranked linages.
COInr is a good starting point to create custom databases according to the users’ needs using mkCOInr scripts available at https://github.com/meglecz/mkCOInr
It is possible to select/eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for BLAST, QIIME, RDP classifiers.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COins is a database of COI-5P sequences of insects that includes over 532,000 representative sequences of more than 106,000 species specifically formatted for the QIIME2 software platform. It was developed through a combination of automated and manually curated steps, starting from insects COI sequences available in the Barcode of Life Data System selecting sequences that comply to several standards, including a species-level identification.seq-degapped.qza --> reference sequencestaxonomy.qza --> sequences taxonomySklearnClassifier_COins_QIIME2_v2024.5.qza (NEW!) --> naïve Bayes taxonomic classifier trained on COins (QIIME2 version 2024.5)SklearnClassifier_COins_QIIME2_v2023.5.qza --> naïve Bayes taxonomic classifier trained on COins (QIIME2 version 2023.5)SklearnClassifier_COins_QIIME2_v2022.2.qza --> naïve Bayes taxonomic classifier trained on COins (QIIME2 version 2022.2)Sequences_metadata1.tsv --> Identification procedure of voucher specimens from which reference sequences were developed.Identification procedure is reported for each sequence included in COins (BOLD id reported in BOLDid reference column) and for all identical sequences within haplotypes that were removed at Step 5 of COins curation (those for which BOLD id is not available in BOLDid reference column). The haplotype to which each sequence belongs is reported in Haplotype column (haplotypes of each species are labeled with increasing numbers). Identification procedure information derived from sequences associated metadata provided by BOLD system.Sequences_metadata2.tsv -->Identical sequences belonging to different species present within COins.Each row represents a cluster of identical sequences associated to different species, sequences included in the cluster are labeled with species name and BOLD id.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is mitochondrial Cytochrome c Oxidase I gene (COI) metabarcoding data of surface seawater metazoan communities from three distinct locations in the rocky intertidal, Pillar Point, Half Moon Bay, California, USA that were sampled over one tidal exposure period on 28 January 2022. This work is associated with a publication in Environmental DNA (https://doi.org/10.1002/edn3.521).
[This dataset was processed using the GBIF eDNA converter tool.]
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The increasing popularity of cytochrome c oxidase subunit 1 (COI) DNA metabarcoding warrants a careful look at the underlying reference databases used to make high-throughput taxonomic assignments. The objectives of this study are to document trends and assess the future usability of COI records for metabarcode identification. The number of COI records deposited to the NCBI nucleotide database has increased by a geometric average of 51% per year, from 8,137 records deposited in 2003 to a cumulative total of ~ 2.5 million by the end of 2017. About half of these records are fully identified to the species rank, 92% are at least 500 bp in length, 74% have a country annotation, and 51% have latitude-longitude annotations. To ensure the future usability of COI records in GenBank we suggest: 1) Improving the geographic representation of COI records, 2) Improving the cross-referencing of COI records in the Barcode of Life Data System and GenBank to facilitate consolidation and incorporation into existing bioinformatic pipelines, 3) Adherence to the minimum information about a marker gene sequence guidelines, and 4) Integrating metabarcodes from eDNA and mixed community studies with existing reference sequences. The growth of COI reference records over the past 15 years has been substantial and is likely to be a resource across many fields for years to come.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Distribution of COI records in BOLD.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Environmental DNA sequence data of invasive species along the harbors in the northern regions of Vastra Gotaland. Sequence data was obtained from both water samples, and plankton samples of two sampling events, one in the summer and one in the autumn of 2024. The amplicons were amplified by COI Leray-XT primers: Wangensteen et al. 2018 and annotated against an inhouse database of invasive and non-indigenous species of Norway, Sweden and EEA created with EchoPipe: Stensrud et al. in press. The data was created for Länsstyrelsen Vastra Gotaland and financed through HaV.
This dataset was published via the SBDI ASV portal. (https://asv-portal.biodiversitydata.se/)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COI sequences database for tardigrades classification used in the manuscript "Occurrence of tardigrade populations in holes on concrete blocks.". Zoological Studies (Accepted - 2025)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DNA barcoding is a method of identifying individual organisms using short DNA fragments matched to a database of reference sequences. For metazoan plankton, a high proportion of species that reside in the deep ocean still lack reliable reference sequences for genetic markers for barcoding and systematics. We report on substantial taxonomic and barcoding efforts across major zooplankton taxonomic groups collected from surface waters to the rarely sampled abyssopelagic zone (0 – 4300 m) from the Gulf of Alaska, North Pacific Ocean. Over 1000 specimens were identified, from which the mitochondrial 16S and COI and nuclear 18S rRNA genes were sequenced. In total, 1462 sequences for 254 unique taxa were generated, adding new barcodes for 107 species, including 12 undescribed species of cnidarians, that previously lacked DNA sequences for at least one of the three genes. Additionally, we introduce the use of a new Open Nomenclature qualifier deoxyribonucleic acid abbreviation DNA (e.g., Genus DNA species, DNA Genus). This qualifier was used for specimens that could not be morphologically identified but could be assigned a low-level taxonomic identification based on the clustering of DNA barcode genes using phylogenetic trees (100% bootstrap support), where at least one of the sequences in that clade could be referred to a physical specimen (or photographs) where identification could be corroborated through morphological analyses. DNA barcodes from this work are incorporated into the MetaZooGene Atlas and Database, an open-access data and metadata portal for barcoding genes used for classifying and identifying marine organisms. As environmental sequencing (i.e., metabarcoding, metagenetics, and eDNA) becomes an increasingly common approach in marine ecosystem studies, continued population of such reference DNA sequence databases must remain a high priority.
Facebook
TwitterThis dataset contains DNA sequences of two genomic regions, cytochrome oxidase subunit one and internally transcribed spacer region one, obtained from Tetracapsuloides bryosalmonae infecting the tissue of fish collected from various locations throughout western and central Montana, USA.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Parameters:
Facebook
TwitterInternational Barcode of Life project (iBOL) Data Packages for animals, releases 0.5-6.00.
Facebook
Twitterhttps://www.neonscience.org/data-samples/data-policies-citationhttps://www.neonscience.org/data-samples/data-policies-citation
COI DNA sequences from select mosquitoes
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Environmental DNA (eDNA) methods complement traditional monitoring and can be configured to detect multiple species simultaneously. One such approach, eDNA metabarcoding, uses high-throughput DNA sequencing to indirectly detect many different organisms, spanning broad taxonomic boundaries, from water samples. We are optimizing a non-invasive, low cost eDNA metabarcoding protocol to be used in conjunction with existing monitoring programs. One resource that is currently lacking for metabarcoding studies in general, including those in the San Francisco Estuary (SFE), is a comprehensive database of DNA barcode reference sequences. Without this foundational data, many species go undetected or misidentified in metabarcoding studies. To meet this need, we generated a custom barcode sequence database for the SFE by DNA sequencing and mining of public DNA seqeunce data for estuarine and freshwater species of interest to monitoring programs and ecological studies. Here we present custom reference sequence databases for three barcodes: Cytochrome C Oxidase I (COI), 12S MiFish and 16S. Methods Data were collected from two sources. Specimens of fish and invertebrates collected from the San Francisco Estuary were used for Sanger DNA sequencing. DNA extractions were performed using the Qiagen Blood and Tissue kit and PCR was performed using primers to amplify the entire barcode sequence. Raw chromatogram data files were manually examined for quality control, aligned, and flanking and primer sequences were trimmed using CodonCode Aligner. For species without physical specimens, or for those specimens that failed PCR/sequencing/QC, publicly available DNA sequences were downloaded from GenBank, and aligned and trimmed to the barcode region using CodonCode Aligner. The combined experimental and downloaded sequences for each barcode were placed into a single .txt file formatted for use with the DADA2 metabarcoding software. For all sequences, an additional verification step was performed by querying the BLASTn database. A separate metadata file (.csv) was also generated for each barcode that includes the specimen name (if applicable), GenBank Accession numbers (if applicable), taxonomic information, common name, and specimen locality, US state, and collection date, if available.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Parameters:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This composition appears in the Co-I region of phase space. It's relative stability is shown in the Co-I phase diagram (left). The relative stability of all other phases at this composition (and the combination of other stable phases, if no compound at this composition is stable) is shown in the relative stability plot (right)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data obtained from computational DFT calculations on Orthorhombic CoI is provided. Available data include crystal structure, bandgap energy, stability, density of states, and calculation input/output files.
Facebook
Twitterhttps://www.neonscience.org/data-samples/data-policies-citationhttps://www.neonscience.org/data-samples/data-policies-citation
COI DNA sequences from select ground beetles
Facebook
Twitterhttps://www.neonscience.org/data-samples/data-policies-citationhttps://www.neonscience.org/data-samples/data-policies-citation
COI DNA sequences from select fish in lakes and wadeable streams
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Deep sea fishes were collected in the northern Gulf of Mexico during DEEPEND cruise DP05 from May 1 to 11, 2017. This dataset contains Genbank accession numbers of DNA sequences of the mitochondrial Cytochrome c oxidase I (COI) gene from fish species collected.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset descriptionThis item contains COI (mitochondrial cytochrome oxidase subunit I) sequences collected from the BOLD database. The fasta file bold_clustered_cleaned.fasta.gz has record ids that can be queried in the Public Data Portal and each fasta header contains the taxonomic ranks + the BIN ID assigned to the record. The taxonomic information for each record is also given in the tab-separated file bold_info_filtered.tsv.gz.The file bold_clustered.sintax.fasta.gz is directly compatible with the SINTAX algorithm in vsearch while files bold_clustered.assignTaxonomy.fasta.gz and bold_clustered.addSpecies.fasta.gz are directly compatible with the assignTaxonomy and addSpecies functions from DADA2, respectively. The dataset was last created on December 16, 2022NOTE: We have noticed that the gzipped files in this upload have been compressed twice for some reason. A quick fix is to unzip any file with a ".gz" extension, then rename the unzipped file by adding the ".gz" extension back. Then running the unzipping once again. Sorry for the inconvenience.MethodsThe code used to generate this dataset consists of a snakemake workflow wrapped into a python package that can be installed with conda (conda install -c bioconda coidb). Firstly sequence and taxonomic information for records in the BOLD database is downloaded from the GBIF Hosted Datasets. This data is then filtered to only keep records annotated as 'COI-5P' and assigned to a BIN ID. The taxonomic information is parsed in order to assign species names and resolve higher level ranks for each BIN ID. Sequences are processed to remove gap characters and leading and trailing Ns. After this, any sequences with remaining non-standard characters are removed. Sequences are then clustered at 100% identity using vsearch (Rognes et al. 2016). This clustering is done separately for sequences assigned to each BIN ID.For more information, see https://github.com/biodiversitydata-se/coidb