100+ datasets found
  1. s

    COI reference sequences from BOLD DB

    • figshare.scilifelab.se
    • researchdata.se
    • +1more
    application/gzip
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Sundh (2025). COI reference sequences from BOLD DB [Dataset]. http://doi.org/10.17044/scilifelab.20514192.v4
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 14, 2025
    Dataset provided by
    Swedish Museum of Natural History
    Authors
    John Sundh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset descriptionThis item contains COI (mitochondrial cytochrome oxidase subunit I) sequences collected from the BOLD database. The fasta file bold_clustered_cleaned.fasta.gz has record ids that can be queried in the Public Data Portal and each fasta header contains the taxonomic ranks + the BIN ID assigned to the record. The taxonomic information for each record is also given in the tab-separated file bold_info_filtered.tsv.gz.The file bold_clustered.sintax.fasta.gz is directly compatible with the SINTAX algorithm in vsearch while files bold_clustered.assignTaxonomy.fasta.gz and bold_clustered.addSpecies.fasta.gz are directly compatible with the assignTaxonomy and addSpecies functions from DADA2, respectively. The dataset was last created on December 16, 2022NOTE: We have noticed that the gzipped files in this upload have been compressed twice for some reason. A quick fix is to unzip any file with a ".gz" extension, then rename the unzipped file by adding the ".gz" extension back. Then running the unzipping once again. Sorry for the inconvenience.MethodsThe code used to generate this dataset consists of a snakemake workflow wrapped into a python package that can be installed with conda (conda install -c bioconda coidb). Firstly sequence and taxonomic information for records in the BOLD database is downloaded from the GBIF Hosted Datasets. This data is then filtered to only keep records annotated as 'COI-5P' and assigned to a BIN ID. The taxonomic information is parsed in order to assign species names and resolve higher level ranks for each BIN ID. Sequences are processed to remove gap characters and leading and trailing Ns. After this, any sequences with remaining non-standard characters are removed. Sequences are then clustered at 100% identity using vsearch (Rognes et al. 2016). This clustering is done separately for sequences assigned to each BIN ID.For more information, see https://github.com/biodiversitydata-se/coidb

  2. Data from: COInr a comprehensive, non-redundant COI database from NCBI-nt...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated May 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emese Meglecz; Emese Meglecz (2023). COInr a comprehensive, non-redundant COI database from NCBI-nt and BOLD [Dataset]. http://doi.org/10.5281/zenodo.6555985
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 5, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Emese Meglecz; Emese Meglecz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COInr is a non-redundant, comprehensive database of COI sequences extracted from NCBI-nt and BOLD. It is not limited to a taxon, a gene region, or a taxonomic resolution. Sequences are dereplicated between databases and within taxa.

    Each taxon has a unique taxonomic Identifier (taxID), fundamental to avoid ambiguous associations of homonyms and synonyms in the source database. TaxIDs form a coherent hierarchical system fully compatible with the NCBI taxIDs allowing creating their full or ranked linages.

    COInr is a good starting point to create custom databases according to the users’ needs using mkCOInr scripts available at https://github.com/meglecz/mkCOInr
    It is possible to select/eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for BLAST, QIIME, RDP classifiers.

  3. COins database

    • figshare.com
    zip
    Updated Aug 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giulia Magoga (2024). COins database [Dataset]. http://doi.org/10.6084/m9.figshare.19130465.v4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 29, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Giulia Magoga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COins is a database of COI-5P sequences of insects that includes over 532,000 representative sequences of more than 106,000 species specifically formatted for the QIIME2 software platform. It was developed through a combination of automated and manually curated steps, starting from insects COI sequences available in the Barcode of Life Data System selecting sequences that comply to several standards, including a species-level identification.seq-degapped.qza --> reference sequencestaxonomy.qza --> sequences taxonomySklearnClassifier_COins_QIIME2_v2024.5.qza (NEW!) --> naïve Bayes taxonomic classifier trained on COins (QIIME2 version 2024.5)SklearnClassifier_COins_QIIME2_v2023.5.qza --> naïve Bayes taxonomic classifier trained on COins (QIIME2 version 2023.5)SklearnClassifier_COins_QIIME2_v2022.2.qza --> naïve Bayes taxonomic classifier trained on COins (QIIME2 version 2022.2)Sequences_metadata1.tsv --> Identification procedure of voucher specimens from which reference sequences were developed.Identification procedure is reported for each sequence included in COins (BOLD id reported in BOLDid reference column) and for all identical sequences within haplotypes that were removed at Step 5 of COins curation (those for which BOLD id is not available in BOLDid reference column). The haplotype to which each sequence belongs is reported in Haplotype column (haplotypes of each species are labeled with increasing numbers). Identification procedure information derived from sequences associated metadata provided by BOLD system.Sequences_metadata2.tsv -->Identical sequences belonging to different species present within COins.Each row represents a cluster of identical sequences associated to different species, sequences included in the cluster are labeled with species name and BOLD id.

  4. COI data from: Environmental DNA metabarcoding differentiates between...

    • demo.gbif.org
    • obis.org
    • +2more
    Updated Sep 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Geological Survey (2025). COI data from: Environmental DNA metabarcoding differentiates between micro-habitats within the rocky intertidal (Shea & Boehm, 2024) [Dataset]. http://doi.org/10.15468/33artc
    Explore at:
    Dataset updated
    Sep 24, 2025
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    United States Geological Survey
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 28, 2022
    Area covered
    Description

    This is mitochondrial Cytochrome c Oxidase I gene (COI) metabarcoding data of surface seawater metazoan communities from three distinct locations in the rocky intertidal, Pillar Point, Half Moon Bay, California, USA that were sampled over one tidal exposure period on 28 January 2022. This work is associated with a publication in Environmental DNA (https://doi.org/10.1002/edn3.521).

    [This dataset was processed using the GBIF eDNA converter tool.]

  5. Over 2.5 million COI sequences in GenBank and growing

    • plos.figshare.com
    pdf
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresita M. Porter; Mehrdad Hajibabaei (2023). Over 2.5 million COI sequences in GenBank and growing [Dataset]. http://doi.org/10.1371/journal.pone.0200177
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Teresita M. Porter; Mehrdad Hajibabaei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The increasing popularity of cytochrome c oxidase subunit 1 (COI) DNA metabarcoding warrants a careful look at the underlying reference databases used to make high-throughput taxonomic assignments. The objectives of this study are to document trends and assess the future usability of COI records for metabarcode identification. The number of COI records deposited to the NCBI nucleotide database has increased by a geometric average of 51% per year, from 8,137 records deposited in 2003 to a cumulative total of ~ 2.5 million by the end of 2017. About half of these records are fully identified to the species rank, 92% are at least 500 bp in length, 74% have a country annotation, and 51% have latitude-longitude annotations. To ensure the future usability of COI records in GenBank we suggest: 1) Improving the geographic representation of COI records, 2) Improving the cross-referencing of COI records in the Barcode of Life Data System and GenBank to facilitate consolidation and incorporation into existing bioinformatic pipelines, 3) Adherence to the minimum information about a marker gene sequence guidelines, and 4) Integrating metabarcodes from eDNA and mixed community studies with existing reference sequences. The growth of COI reference records over the past 15 years has been substantial and is likely to be a resource across many fields for years to come.

  6. Distribution of COI records in BOLD.

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresita M. Porter; Mehrdad Hajibabaei (2023). Distribution of COI records in BOLD. [Dataset]. http://doi.org/10.1371/journal.pone.0200177.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Teresita M. Porter; Mehrdad Hajibabaei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Distribution of COI records in BOLD.

  7. COI data from: Invasive species detection along coastal harbours in northern...

    • researchdata.se
    • gbif.org
    • +2more
    Updated Sep 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eivind Stensrud (2025). COI data from: Invasive species detection along coastal harbours in northern region of Vastra Gotaland 2024 [Dataset]. http://doi.org/10.15468/AR7WWX
    Explore at:
    Dataset updated
    Sep 23, 2025
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Authors
    Eivind Stensrud
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Västra Götaland County
    Description

    Environmental DNA sequence data of invasive species along the harbors in the northern regions of Vastra Gotaland. Sequence data was obtained from both water samples, and plankton samples of two sampling events, one in the summer and one in the autumn of 2024. The amplicons were amplified by COI Leray-XT primers: Wangensteen et al. 2018 and annotated against an inhouse database of invasive and non-indigenous species of Norway, Sweden and EEA created with EchoPipe: Stensrud et al. in press. The data was created for Länsstyrelsen Vastra Gotaland and financed through HaV.

    This dataset was published via the SBDI ASV portal. (https://asv-portal.biodiversitydata.se/)

  8. COI sequences database for tardigrades classification

    • figshare.com
    txt
    Updated Jul 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matteo Vecchi (2025). COI sequences database for tardigrades classification [Dataset]. http://doi.org/10.6084/m9.figshare.29596613.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 18, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Matteo Vecchi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COI sequences database for tardigrades classification used in the manuscript "Occurrence of tardigrade populations in holes on concrete blocks.". Zoological Studies (Accepted - 2025)

  9. f

    Table 1_DNA barcoding deep-water zooplankton from the Gulf of Alaska, North...

    • figshare.com
    xlsx
    Updated Mar 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer M. Questel; Caitlin A. Smoot; Allen G. Collins; Dhugal J. Lindsay; Russell R. Hopcroft (2025). Table 1_DNA barcoding deep-water zooplankton from the Gulf of Alaska, North Pacific Ocean.xlsx [Dataset]. http://doi.org/10.3389/fmars.2025.1515048.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 27, 2025
    Dataset provided by
    Frontiers
    Authors
    Jennifer M. Questel; Caitlin A. Smoot; Allen G. Collins; Dhugal J. Lindsay; Russell R. Hopcroft
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Pacific Ocean, Gulf of Alaska, North Pacific Ocean
    Description

    DNA barcoding is a method of identifying individual organisms using short DNA fragments matched to a database of reference sequences. For metazoan plankton, a high proportion of species that reside in the deep ocean still lack reliable reference sequences for genetic markers for barcoding and systematics. We report on substantial taxonomic and barcoding efforts across major zooplankton taxonomic groups collected from surface waters to the rarely sampled abyssopelagic zone (0 – 4300 m) from the Gulf of Alaska, North Pacific Ocean. Over 1000 specimens were identified, from which the mitochondrial 16S and COI and nuclear 18S rRNA genes were sequenced. In total, 1462 sequences for 254 unique taxa were generated, adding new barcodes for 107 species, including 12 undescribed species of cnidarians, that previously lacked DNA sequences for at least one of the three genes. Additionally, we introduce the use of a new Open Nomenclature qualifier deoxyribonucleic acid abbreviation DNA (e.g., Genus DNA species, DNA Genus). This qualifier was used for specimens that could not be morphologically identified but could be assigned a low-level taxonomic identification based on the clustering of DNA barcode genes using phylogenetic trees (100% bootstrap support), where at least one of the sequences in that clade could be referred to a physical specimen (or photographs) where identification could be corroborated through morphological analyses. DNA barcodes from this work are incorporated into the MetaZooGene Atlas and Database, an open-access data and metadata portal for barcoding genes used for classifying and identifying marine organisms. As environmental sequencing (i.e., metabarcoding, metagenetics, and eDNA) becomes an increasingly common approach in marine ecosystem studies, continued population of such reference DNA sequence databases must remain a high priority.

  10. d

    Cytochrome oxidase subunit I (COI) and internally transcribed spacer region...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Cytochrome oxidase subunit I (COI) and internally transcribed spacer region I (ITS1) DNA sequences of Tetracapsuloides bryosalmonae parasites in kidney tissue collected from various water bodies in Montana, USA from 2016 to 2019 [Dataset]. https://catalog.data.gov/dataset/cytochrome-oxidase-subunit-i-coi-and-internally-transcribed-spacer-region-i-its1-dna-seque
    Explore at:
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States
    Description

    This dataset contains DNA sequences of two genomic regions, cytochrome oxidase subunit one and internally transcribed spacer region one, obtained from Tetracapsuloides bryosalmonae infecting the tissue of fish collected from various locations throughout western and central Montana, USA.

  11. BCdatabaser - coi.arthropoda.none.2023-05-08.zip

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jul 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachele Sarah Wilson; Rachele Sarah Wilson (2023). BCdatabaser - coi.arthropoda.none.2023-05-08.zip [Dataset]. http://doi.org/10.5281/zenodo.7911657
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 2, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rachele Sarah Wilson; Rachele Sarah Wilson
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Parameters:

    • --marker-search-string COI OR CO1 OR 'Cytochrome oxidase 1' OR 'Cytochrome oxidase I'
    • --taxonomic-range Arthropoda
    • --sequence-length-filter 100:2000
    • --sequences-per-taxon 9

  12. Z

    iBOL DNA barcode data for animals (COI)

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +1more
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BOLD (2020). iBOL DNA barcode data for animals (COI) [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_28843
    Explore at:
    Dataset updated
    Jan 24, 2020
    Authors
    BOLD
    Description

    International Barcode of Life project (iBOL) Data Packages for animals, releases 0.5-6.00.

  13. n

    NEON (National Ecological Observatory Network) Mosquito sequences DNA...

    • data.neonscience.org
    zip
    Updated Oct 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). NEON (National Ecological Observatory Network) Mosquito sequences DNA barcode (DP1.10038.001) [Dataset]. https://data.neonscience.org/data-products/DP1.10038.001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 15, 2024
    License

    https://www.neonscience.org/data-samples/data-policies-citationhttps://www.neonscience.org/data-samples/data-policies-citation

    Time period covered
    Jul 2012 - Oct 2024
    Area covered
    RMNP, DSNY, WOOD, ABBY, JERC, SCBI, ORNL, LAJA, UKFS, KONZ
    Description

    COI DNA sequences from select mosquitoes

  14. n

    Reference sequence database for eDNA metabarcoding of San Francisco estuary...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raman Nagarajan; Ann Holmes; Andrea Schreier (2023). Reference sequence database for eDNA metabarcoding of San Francisco estuary fishes and invertebrates [Dataset]. http://doi.org/10.5061/dryad.0p2ngf25z
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 4, 2023
    Dataset provided by
    University of California, Davis
    Authors
    Raman Nagarajan; Ann Holmes; Andrea Schreier
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    San Francisco
    Description

    Environmental DNA (eDNA) methods complement traditional monitoring and can be configured to detect multiple species simultaneously. One such approach, eDNA metabarcoding, uses high-throughput DNA sequencing to indirectly detect many different organisms, spanning broad taxonomic boundaries, from water samples. We are optimizing a non-invasive, low cost eDNA metabarcoding protocol to be used in conjunction with existing monitoring programs. One resource that is currently lacking for metabarcoding studies in general, including those in the San Francisco Estuary (SFE), is a comprehensive database of DNA barcode reference sequences. Without this foundational data, many species go undetected or misidentified in metabarcoding studies. To meet this need, we generated a custom barcode sequence database for the SFE by DNA sequencing and mining of public DNA seqeunce data for estuarine and freshwater species of interest to monitoring programs and ecological studies. Here we present custom reference sequence databases for three barcodes: Cytochrome C Oxidase I (COI), 12S MiFish and 16S. Methods Data were collected from two sources. Specimens of fish and invertebrates collected from the San Francisco Estuary were used for Sanger DNA sequencing. DNA extractions were performed using the Qiagen Blood and Tissue kit and PCR was performed using primers to amplify the entire barcode sequence. Raw chromatogram data files were manually examined for quality control, aligned, and flanking and primer sequences were trimmed using CodonCode Aligner. For species without physical specimens, or for those specimens that failed PCR/sequencing/QC, publicly available DNA sequences were downloaded from GenBank, and aligned and trimmed to the barcode region using CodonCode Aligner. The combined experimental and downloaded sequences for each barcode were placed into a single .txt file formatted for use with the DADA2 metabarcoding software. For all sequences, an additional verification step was performed by querying the BLASTn database. A separate metadata file (.csv) was also generated for each barcode that includes the specimen name (if applicable), GenBank Accession numbers (if applicable), taxonomic information, common name, and specimen locality, US state, and collection date, if available.

  15. BCdatabaser - coi.plethodontidae.none.2021-07-13.zip

    • zenodo.org
    • data-staging.niaid.nih.gov
    • +1more
    zip
    Updated Jul 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Todd Pierson; Todd Pierson (2021). BCdatabaser - coi.plethodontidae.none.2021-07-13.zip [Dataset]. http://doi.org/10.5281/zenodo.5096321
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 14, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Todd Pierson; Todd Pierson
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Parameters:

    • --marker-search-string COI
    • --taxonomic-range Plethodontidae
    • --sequence-length-filter 100:2000
    • --sequences-per-taxon 9

  16. o

    Computational stability data of CoI from Density Functional Theory...

    • oqmd.org
    Updated Mar 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Open Quantum Materials Database (2022). Computational stability data of CoI from Density Functional Theory calculations [Dataset]. https://www.oqmd.org/materials/composition/CoI
    Explore at:
    Dataset updated
    Mar 24, 2022
    Dataset authored and provided by
    The Open Quantum Materials Database
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Stability, Composition
    Measurement technique
    Computational, Density Functional Theory
    Description

    This composition appears in the Co-I region of phase space. It's relative stability is shown in the Co-I phase diagram (left). The relative stability of all other phases at this composition (and the combination of other stable phases, if no compound at this composition is stable) is shown in the relative stability plot (right)

  17. o

    Computational data of Orthorhombic CoI from Density Functional Theory...

    • oqmd.org
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Open Quantum Materials Database, Computational data of Orthorhombic CoI from Density Functional Theory calculations [Dataset]. https://www.oqmd.org/materials/entry/1219791
    Explore at:
    Dataset authored and provided by
    The Open Quantum Materials Database
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Name, Bandgap, Stability, Crystal volume, Formation energy, Symmetry spacegroup, Number of atoms in unit cell
    Measurement technique
    Computational, Density Functional Theory
    Description

    Data obtained from computational DFT calculations on Orthorhombic CoI is provided. Available data include crystal structure, bandgap energy, stability, density of states, and calculation input/output files.

  18. n

    NEON (National Ecological Observatory Network) Ground beetle sequences DNA...

    • data.neonscience.org
    zip
    Updated Nov 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). NEON (National Ecological Observatory Network) Ground beetle sequences DNA barcode (DP1.10020.001) [Dataset]. https://data.neonscience.org/data-products/DP1.10020.001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 15, 2022
    License

    https://www.neonscience.org/data-samples/data-policies-citationhttps://www.neonscience.org/data-samples/data-policies-citation

    Time period covered
    Jul 2012 - Nov 2022
    Area covered
    RMNP, GUAN, WOOD, OAES, OSBS, GRSM, HARV, SERC, DCFS, BARR
    Description

    COI DNA sequences from select ground beetles

  19. n

    NEON (National Ecological Observatory Network) Fish sequences DNA barcode...

    • data.neonscience.org
    zip
    Updated Dec 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). NEON (National Ecological Observatory Network) Fish sequences DNA barcode (DP1.20105.001) [Dataset]. https://data.neonscience.org/data-products/DP1.20105.001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 15, 2023
    License

    https://www.neonscience.org/data-samples/data-policies-citationhttps://www.neonscience.org/data-samples/data-policies-citation

    Time period covered
    Nov 2017 - Dec 2023
    Area covered
    TECR, LIRO, GUIL, WLOU, LECO, POSE, BLDE, HOPB, CUPE, SYCA
    Description

    COI DNA sequences from select fish in lakes and wadeable streams

  20. g

    DNA sequences of mitochondrial Cytochrome c oxidase I (COI) genes from deep...

    • data.griidc.org
    • search.dataone.org
    Updated Oct 26, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahmood Shivji (2018). DNA sequences of mitochondrial Cytochrome c oxidase I (COI) genes from deep sea fishes collected during DEEPEND cruise DP05 from 2017-05-01 to 2017-05-11 [Dataset]. http://doi.org/10.7266/n7-rmfn-4d68
    Explore at:
    Dataset updated
    Oct 26, 2018
    Dataset provided by
    GRIIDC
    Authors
    Mahmood Shivji
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Description

    Deep sea fishes were collected in the northern Gulf of Mexico during DEEPEND cruise DP05 from May 1 to 11, 2017. This dataset contains Genbank accession numbers of DNA sequences of the mitochondrial Cytochrome c oxidase I (COI) gene from fish species collected.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
John Sundh (2025). COI reference sequences from BOLD DB [Dataset]. http://doi.org/10.17044/scilifelab.20514192.v4

COI reference sequences from BOLD DB

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
application/gzipAvailable download formats
Dataset updated
Apr 14, 2025
Dataset provided by
Swedish Museum of Natural History
Authors
John Sundh
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset descriptionThis item contains COI (mitochondrial cytochrome oxidase subunit I) sequences collected from the BOLD database. The fasta file bold_clustered_cleaned.fasta.gz has record ids that can be queried in the Public Data Portal and each fasta header contains the taxonomic ranks + the BIN ID assigned to the record. The taxonomic information for each record is also given in the tab-separated file bold_info_filtered.tsv.gz.The file bold_clustered.sintax.fasta.gz is directly compatible with the SINTAX algorithm in vsearch while files bold_clustered.assignTaxonomy.fasta.gz and bold_clustered.addSpecies.fasta.gz are directly compatible with the assignTaxonomy and addSpecies functions from DADA2, respectively. The dataset was last created on December 16, 2022NOTE: We have noticed that the gzipped files in this upload have been compressed twice for some reason. A quick fix is to unzip any file with a ".gz" extension, then rename the unzipped file by adding the ".gz" extension back. Then running the unzipping once again. Sorry for the inconvenience.MethodsThe code used to generate this dataset consists of a snakemake workflow wrapped into a python package that can be installed with conda (conda install -c bioconda coidb). Firstly sequence and taxonomic information for records in the BOLD database is downloaded from the GBIF Hosted Datasets. This data is then filtered to only keep records annotated as 'COI-5P' and assigned to a BIN ID. The taxonomic information is parsed in order to assign species names and resolve higher level ranks for each BIN ID. Sequences are processed to remove gap characters and leading and trailing Ns. After this, any sequences with remaining non-standard characters are removed. Sequences are then clustered at 100% identity using vsearch (Rognes et al. 2016). This clustering is done separately for sequences assigned to each BIN ID.For more information, see https://github.com/biodiversitydata-se/coidb

Search
Clear search
Close search
Google apps
Main menu