13 datasets found

n
Data from: DDBJ Sequence Read Archive
neuinfo.org
Updated Sep 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). DDBJ Sequence Read Archive [Dataset]. http://identifiers.org/RRID:SCR_001370/resolver/mentions?q=&i=rrid
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_001370 https://identifiers.org/RRID:SCR_001370/resolver/mentions?q=&i=rrid
Dataset updated
Sep 9, 2024
Description
Archive database for output data generated by next-generation sequencing machines including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, and others. DRA is a member of the International Nucleotide Sequence Database Collaboration (INSDC) and archiving the data in a close collaboration with NCBI Sequence Read Archive (SRA) and EBI Sequence Read Archive (ERA). Please submit the trace data from conventional capillary sequencers to DDBJ Trace Archive., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.
s
European Nucleotide Archive (ENA)
scicrunch.org
neuinfo.org
Updated Oct 17, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). European Nucleotide Archive (ENA) [Dataset]. http://identifiers.org/RRID:SCR_006515
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_006515
Dataset updated
Oct 17, 2019
Description
Public archive providing a comprehensive record of the world''''s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. All submitted data, once public, will be exchanged with the NCBI and DDBJ as part of the INSDC data exchange agreement. The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources including submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centers and routine and comprehensive exchange with their partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature. ENA is made up of a number of distinct databases that includes the EMBL Nucleotide Sequence Database (Embl-Bank), the newly established Sequence Read Archive (SRA) and the Trace Archive. The main tool for downloading ENA data is the ENA Browser, which is available through REST URLs for easy programmatic use. All ENA data are available through the ENA Browser. Note: EMBL Nucleotide Sequence Database (EMBL-Bank) is entirely included within this resource.
o
Innuendo Whole Genome And Core Genome Mlst Schemas And Datasets For Yersinia...
explore.openaire.eu
Updated Jul 30, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirko Rossi; Mickael Santos Da Silva; Bruno Filipe Ribeiro-Gonçalves; Diogo Nuno Silva; Miguel Paulo Machado; Mónica Oleastro; Vítor Borges; Joana Isidro; Luis Viera; Jani Halkilahti; Anniina Jaakkonen; Riikka Laukkanen-Ninios; Maria Fredriksson-Ahomaa; Saara Salmenlinna; Marjaana Hakkinen; Javier Garaizar; Joseba Bikandi; Friederike Hilbert; João André Carriço (2018). Innuendo Whole Genome And Core Genome Mlst Schemas And Datasets For Yersinia Enterocolitica [Dataset]. http://doi.org/10.5281/zenodo.1421262
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.1421262
Dataset updated
Jul 30, 2018
Authors
Mirko Rossi; Mickael Santos Da Silva; Bruno Filipe Ribeiro-Gonçalves; Diogo Nuno Silva; Miguel Paulo Machado; Mónica Oleastro; Vítor Borges; Joana Isidro; Luis Viera; Jani Halkilahti; Anniina Jaakkonen; Riikka Laukkanen-Ninios; Maria Fredriksson-Ahomaa; Saara Salmenlinna; Marjaana Hakkinen; Javier Garaizar; Joseba Bikandi; Friederike Hilbert; João André Carriço
Description
Dataset All the raw reads deposited in the European Nucleotide Archive (ENA) or in the NCBI Sequence Read Archive (SRA) as Y. enterocolitica at the time of the analysis (August 2018) were retrieved using getSeqENA. A total of 252 genomes were successfully assembled using INNUca v3.1. In addition to public available genomes, the database includes 79 novel Y. enterocolitica strains which belong to the INNUENDO Sequence Dataset (PRJEB27020). File 'Metadata/Yenterocolitica_metadata.txt' contains metadata information for each strain including country and year of isolation, source classification, taxon of the host, serotype, biotype, pathotype (according to patho_typing software) and classical pubMLST 7 genes ST according to Hall et al., 2005. The directory 'Genomes' contains all the 331 INNUca V3.1 assemblies of the strains listed in 'Metadata/Yenterocolitica_metadata.txt'. Schema creation and validation All the 331 genomes were used for creating the schema using chewBBACA suite. The quality of the loci have been assessed using chewBBACA Schema Evaluation and loci with single alleles, those with high length variability (i.e. if more than 1 allele is outside the mode +/- 0.05 size) and those present in less than 1% of the genomes have been removed. The wgMLST schema have been further curated, excluding all those loci detected as “Repeated Loci” and loci annotated as “non-informative paralogous hit (NIPH/ NIPHEM)” or “Allele Larger/ Smaller than length mode (ALM/ ASM)” by the chewBBACA Allele Calling in more than 1% of a dataset. File 'Schema/Yenterocolitica_wgMLST_ 6344_schema.tar.gz' contains the wgMLST schema formatted for chewBBACA and includes a total of 6,344 loci. File 'Schema/Yenterocolitica_cgMLST_ 2406_listGenes.txt' contains the list of genes from the wgMLST schema which defines the cgMLST schema. The cgMLST schema consists of 2,406 loci and has been defined as the loci present in at least the 99% of the 331 Y. enterocolitica genomes. Genomes have no more than 2% of missing loci. File 'Allele_Profles/Yenterocolitica_wgMLST_alleleProfiles.tsv' contains the wgMLST allelic profile of the 331 Y. enterocolitica genomes of the dataset. Please note that missing loci follow the annotation of chewBBACA Allele Calling software. File 'Allele_Profles/Yenterocolitica_cgMLST_alleleProfiles.tsv' contains the cgMLST allelic profile of the 331 Y. enterocolitica genomes of the dataset. Please note that missing loci are indicated with a zero. Additional citation The schema are prepared to be used with chewBBACA. When using the schema in this repository please cite also: Silva M, Machado M, Silva D, Rossi M, Moran-Gilad J, Santos S, Ramirez M, Carriço J. chewBBACA: A complete suite for gene-by-gene schema creation and strain identification. 15/03/2018. M Gen 4(3): doi:10.1099/mgen.0.000166 http://mgen.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000166 The isolates' genomes raw sequence data produced within the activity of the INNUENDO project were submitted to the European Nucleotide Archive (ENA) database and are publicly available under the project accession number PRJEB27020. When using the schemas, the assemblies or the allele profiles please include the project number in your publication. The research from the INNUENDO project has received funding from European Food Safety Authority (EFSA), grant agreement GP/EFSA/AFSCO/2015/01/CT2 (New approaches in identifying and characterizing microbial and chemical hazards) and from the Government of the Basque Country. The conclusions, findings, and opinions expressed in this repository reflect only the view of the INNUENDO consortium members and not the official position of EFSA nor of the Government of the Basque Country. EFSA and the Government of the Basque Country are not responsible for any use that may be made of the information included in this repository. The INNUENDO consortium thanks the Austrian Agency for Health and Food Safety Limited for participating in the project by providing strains. The consortium thanks all the researchers and the authorities worldwide which are contributing by submitting the raw sequences of the bacterial strains in public repositories. The project was possible thanks to the support of CSC- Tieteen tietotekniikan keskus Oy (https://www.csc.fi/) and of INCD (http://www.incd.pt/, funded by FCT and FEDER under the project 22153-01/SAICT/2016) for providing access to cloud computing resources.
Data from: List of size fractionated eukaryotic plankton community samples...
doi.pangaea.de
zip
Updated Feb 20, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Colomban De Vargas; Participants Tara Oceans Expedition; Coordinators Tara Oceans Consortium (2015). List of size fractionated eukaryotic plankton community samples and associated metadata (Database W1) [Dataset]. http://doi.org/10.1594/PANGAEA.843017
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.843017
Dataset updated
Feb 20, 2015
Dataset provided by
PANGAEA
Authors
Colomban De Vargas; Participants Tara Oceans Expedition; Coordinators Tara Oceans Consortium
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
The present data set provides an Excel file in a zip archive. The file lists 334 samples of size fractionated eukaryotic plankton community with a suite of associated metadata (Database W1). Note that if most samples represented the piconano- (0.8-5 µm, 73 samples), nano- (5-20 µm, 74 samples), micro- (20-180 µm, 70 samples), and meso- (180-2000 µm, 76 samples) planktonic size fractions, some represented different organismal size-fractions: 0.2-3 µm (1 sample), 0.8-20 µm (6 samples), 0.8 µm - infinity (33 samples), and 3-20 µm (1 sample). The table contains the following fields: a unique sample sequence identifier; the sampling station identifier; the Tara Oceans sample identifier (TARA_xxxxxxxxxx); an INDSC accession number allowing to retrieve raw sequence data for the major nucleotide databases (short read archives at EBI, NCBI or DDBJ); the depth of sampling (Subsurface - SUR or Deep Chlorophyll Maximum - DCM); the targeted size range; the sequences template (either DNA or WGA/DNA if DNA extracted from the filters was Whole Genome Amplified); the latitude of the sampling event (decimal degrees); the longitude of the sampling event (decimal degrees); the time and date of the sampling event; the device used to collect the sample; the logsheet event corresponding to the sampling event ; the volume of water sampled (liters). Then follows information on the cleaning bioinformatics pipeline shown on Figure W2 of the supplementary litterature publication: the number of merged pairs present in the raw sequence file; the number of those sequences matching both primers; the number of sequences after quality-check filtering; the number of sequences after chimera removal; and finally the number of sequences after selecting only barcodes present in at least three copies in total and in at least two samples. Finally, are given for each sequence sample: the number of distinct sequences (metabarcodes); the number of OTUs; the average number of barcode per OTU; the Shannon diversity index based on barcodes for each sample (URL of W4 dataset in PANGAEA); and the Shannon diversity index based on each OTU (URL of W5 dataset in PANGAEA).
Investigation of Psylliodes chrysocephala aestivation by RNA-seq
figshare.com
xlsx
Updated Nov 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Doga CEDDEN (2023). Investigation of Psylliodes chrysocephala aestivation by RNA-seq [Dataset]. http://doi.org/10.6084/m9.figshare.24085815.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24085815.v1
Dataset updated
Nov 10, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Doga CEDDEN
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Library preparation: The total RNA from pre-aestivation (5-day-old), aestivation (30-day-old), and post-aestivation (55-day-old) female beetles were extracted using ZYMO Quick-RNA Tissue/Insect Kit (ZYMO Research, Irvine, CA, USA) and cleaned using TURBO DNA-free™ kit (Thermo Fisher Scientific, Langenselbold, Germany) according to the manufacturer’s instructions. We opted to sample only the females to eliminate sex-related variations. RNA quantity was determined using a Nanodrop ND-1000 UV/Vis spectrophotometer (Thermo Fisher Scientific). The integrity of the RNA samples was determined using the Agilent 2100 Bioanalyzer and an RNA 6000 Nano Kit (Agilent Technologies, Santa Clara, CA, USA). RIN values ≥ 7.0 were considered appropriate for mRNA library preparation. In total, 10 libraries (4, 3, and 3 libraries respectively per pre-aestivation, aestivation, and post-aestivation stages) were prepared using NEBNext® Poly(A) mRNA Magnetic Isolation Module kit (NEB E7490, New England Biolabs) according to the manufacturer’s instructions. The qualities of the libraries were checked via RNA fragment analysis conducted on the Agilent 2100 Bioanalyzer using the Agilent DNF-935 Reagent Kit (Agilent Technologies). The libraries were pooled based on their concentration, and an overall concentration of 3.4 ng/µL was obtained. The sequencing service was provided by BGI Genomics Tech Solutions Co. Ltd (Hong Kong) on a DNBSEQ-T7 platform. The ten raw read files were deposited at Sequence Read Archive (SRA) database of NCBI under the accessions SAMN33022552 - SAMN33022561.De novo assembly and functional annotation: Erroneous k-mers from paired read ends were removed using r-Corrector (v1.0.5) the with default options (Song & Florea, 2015), and the unfixable reads were discarded using the “FilterUncorrectabledPEfastq.py” function in Transcriptome Assembly Tools (Song & Florea, 2015). The adaptor sequences from the reads were removed, and the reads having a quality score above 30 were retained using TrimGalore! (v0.6.7). The cleaned reads (n = 3 per three adult phases) were de novo assembled using Trinity with default options. In total, 224 million bases covering 341,670 transcripts, including putative isoforms, were successfully assembled. The de novo assembly had an N50 value of 1532 and a BUSCO (v5.4.2) completeness score of 96.7% when compared against the endopterygota lineage (BUSCO.v4 datasets). Furthermore, the putative isoforms were combined to obtain a supertranscriptome that contained 189,229 transcripts in total. The supertranscriptome was deposited at GeneBank as a Transcriptome Shotgun Assembly (TSA) under the accession GKIH00000000.1. The transcriptome (including isoforms) was annotated using Trinotate (v3.2.2), which combines the outputs of NCBI BLAST+ (v2.13.0; nucleotide and predicted protein BLAST), TransDecoder (v5.5.0; coding region prediction), signal (v4.0; signal peptide prediction), TmHMM (v2.0; transmembrane domain prediction), and HMMER (v3.3.2; homology search) packages into an SQLite annotation database. The latest uniport_sprot (04/2022) and Pfam-A (11/2015) databases were downloaded using Trinotate, and the default E-value thresholds were used during the searches with BLAST+ and HMMER, respectively. The obtained annotation database was used to extract gene ontology (GO) terms associated with individual genes using the “extract_GO_assignments_from_Trinotate_xls.pl” whereas the signals and TmHMM outputs were manually extracted using Excel spreadsheets. The longest protein-coding regions in the super transcript data predicted by TransDecoder were subjected to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation via GhostKoala v2.2 (https://www.kegg.jp/ghostkoala/). The annotation database was made available publicly on Figshare (https://doi.org/10.6084/m9.figshare.21922938). Differentially expressed genes: The read counts per putative genes were calculated using Salmon (v1.9) by mapping the cleaned reads onto our de novo transcriptome. Genes that had less than 15 read counts across all samples were filtered, and R package “DeSeq2“ (v4.2) was used to identify the differentially expressed genes in the following comparisons; aestivation vs. pre-aestivation, aestivation vs. post-aestivation, and pre-aestivation vs. post-aestivation (DeSeq2 was also allowed to conduct the default filtering). For each comparison, the genes having adjusted P values — which tested for the null hypothesis that the Log2 Fold change (LFC) was 0 — below 0.05 in addition to LFC values below -1 and above 1 were accepted as significantly down- and up-regulated, respectively. Enrichment analyses: The “enricher” function in the R package ”ProfileClusterer” was used to analyze the enrichment status of GO terms and KEGG pathways associated with the differentially expressed genes in the three pair-wise comparisons. All the genes that had passed the filtration before the DeSeq2 analysis served as the background. Importantly, we did not distinguish between up- and down-regulation during the enrichment analyses due to the ambiguous nature of the term and pathway annotations. We selected the top 14 most significantly enriched GO terms and the top 3 most significantly enriched KEGG pathways to be shown in the bubble plots (full enrichment results were provided in Fig. S). The dataset was also investigated in terms of the number of genes predicted to have signal peptides, transmembrane domains, both, or neither. The number of genes belonging to each category was determined by manually investigating the SQLite annotation database, and Chi-squared tests were performed to compare the proportion of each category among differentially expressed genes with that among the background genes. Here, the upregulated and downregulated genes were separately analyzed, and Bonferroni correction was applied (P < .05/18 = .002). The gene hits from significantly enriched GO terms of interest were selected for the visualization of their expressions at three adult stages. A custom R script was used to Z-normalize the expression of each gene across the three adult stages and GraphPad Prism v10.0 was used to construct the heat maps. The names of the genes were extracted from the annotation database constructed in this study.
n
Data from: Updating splits, lumps, and shuffles: Reconciling GenBank names...
data-staging.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Aug 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter A Hosner; Rebecca T Kimball; Edward L Braun; J Gordon Burleigh; Peter Hosner; Min Zhao; Rebecca Kimball; Edward Braun; Gordon Burleigh (2022). Updating splits, lumps, and shuffles: Reconciling GenBank names with standardized avian taxonomies [Dataset]. http://doi.org/10.5061/dryad.gtht76hqf
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.gtht76hqf
Dataset updated
Aug 25, 2022
Dataset provided by
University of Florida
University of Copenhagen
Authors
Peter A Hosner; Rebecca T Kimball; Edward L Braun; J Gordon Burleigh; Peter Hosner; Min Zhao; Rebecca Kimball; Edward Braun; Gordon Burleigh
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Abstract Biodiversity research has advanced by testing expectations of ecological and evolutionary hypotheses through the linking of large-scale genetic, distributional, and trait datasets. The rise of molecular systematics over the past 30 years has resulted in a wealth of DNA sequences from around the globe. Yet, advances in molecular systematics also have created taxonomic instability, as new estimates of evolutionary relationships and interpretations of species limits have required widespread scientific name changes. Taxonomic instability, colloquially “splits, lumps, and shuffles,” presents logistical challenges to large-scale biodiversity research because (1) the same species or sets of populations may be listed under different names in different data sources, or (2) the same name may apply to different sets of populations representing different taxonomic concepts. Consequently, distributional and trait data are often difficult to link directly to primary DNA sequence data without extensive and time-consuming curation. Here, we present RANT: Reconciliation of Avian NCBI Taxonomy. RANT applies taxonomic reconciliation to standardize avian taxon names in use in NCBI GenBank, a primary source of genetic data, to a widely used and regularly updated avian taxonomy: eBird/Clements. Of 14,341 avian species/subspecies names in GenBank, 11,031 directly matched an eBird/Clements; these link to more than 6 million nucleotide sequences. For the remaining unmatched avian names in GenBank, we used Avibase’s system of taxonomic concepts, taxonomic descriptions in Cornell’s Birds of the World, and DNA sequence metadata to identify corresponding eBird/Clements names. Reconciled names linked to more than 600,000 nucleotide sequences, ~9% of all avian sequences on GenBank. Nearly 10% of eBird/Clements names had nucleotide sequences listed under 2 or more GenBank names. Our taxonomic reconciliation is a first step towards rigorous and open-source curation of avian GenBank sequences and is available at GitHub, where it can be updated to correspond to future annual eBird/Clements taxonomic updates. Methods Taxonomic reconciliationWe downloaded all names from the NCBI Taxonomy database (Schoch et al., 2020) that descended from “Aves” (TaxID: 8782) on 3 May 2020 (Data Repository D2). From this list, we extracted all species and subspecies names as well as their NCBI Taxonomy ID (TaxID) numbers. We then ran a custom Perl script (Data Repository D3) to exactly match binomial (genus, species) and trinomial (genus, species, subspecies) names from NCBI Taxonomy to the names recognized by eBird/Clements v2019 Integrated Checklist (August 2019; Data Repository D4). For each mismatch with the NCBI Taxonomy name, we then identified the corresponding equivalent eBird/Clements species or subspecies. We first searched for names in Avibase (Lepage et al., 2014). However, Avibase’s search function currently facilitates only exact matches to taxonomies it implements. For names that were not an exact match to an Avibase taxonomic concept, we implemented web searches (Google) which often identified minor spelling differences, consulted Cornell’s Birds of the World Online (https://birdsoftheworld.org), and consulted relevant literature— often the papers that first published those sequence data. We classified nine categories of naming mismatches resulting from discrepancies between GenBank and eBird/Clements names: split, lump, shuffle, new, spelling, hybrid, extinct, domesticated, and unidentified (Table 2). Split is a name that corresponds to a subspecies rank in GenBank, but a species rank in eBird/Clements. For example, the GenBank subspecies name Otus megalotis everetti (taxiid: 56274) corresponds to the species name Otus everetti in eBird/Clements. Lump is a name that corresponds to species rank in GenBank, but a subspecies rank in eBird/Clements. For example, the GenBank name Megascops colombianus (TaxID: 1740167) corresponds to Megascops ingens colombianus in eBird/Clements. Shuffle is a taxon that has an equivalent rank in GenBank and eBird/Clements, but different name usage. Most often shuffles stem from changes in genera, but a few species epithets have changed because of new evidence regarding nomenclature priority. For example, the GenBank name Mimizuku gurneyi (id: 56287) corresponds to Otus gurneyi in eBird/Clements, reflecting a change in the generic name. New is a species or subspecies that was undescribed when its sequences were initially uploaded to GenBank. To preserve nomenclature priority, GenBank avoids unpublished or in-press names of undescribed taxa, instead assigning an informal placeholder name. Typically, the placeholder name consists of the genus, the data uploaders' initials, and the year of first upload. For example, Megascops_sp._SMD-2015 (TaxID: 1740173) corresponds to the Santa Marta Screech-Owl, Megascops gilesi, Krabbe, 2017. Spelling is a taxon that has an equivalent name in GenBank and eBird/Clements, but for which a slightly different spelling is implemented. For example, the GenBank name Glaucidium nanum (TaxID: 126809) corresponds to the eBird/Clements name Glaucidium nana. Hybrid is a hybrid individual and usually identified in GenBank by a name comprising the putative parental species separated by a cross “x”. For example, the GenBank name Strix occidentalis x Strix varia. Hybrids were not reconciled to eBird/Clements names, although eBird taxonomy does include and organize names for some frequent avian hybrid parental combinations. Extinct is an extinct taxon that is not regulated by eBird/Clements because it was not documented in the modern era. For example, the elephant bird Aepyornis maximus (TaxID: 748142) is known from Holocene bones and eggshell materials that have yielded DNA sequences, but this name is not regulated by eBird/Clements. Domesticated is a domesticated breed or line. For example, GenBank has a listing for the domesticated “Society Finch” as Lonchura striata domestica (TaxID: 299123), but in eBird/Clements it refers to Lonchura striata because domesticated forms are not generally considered subspecies. Finally, Unidentified refers to TaxIDs where we were unable to assign a species name. These were generally samples not identified to species, or environmental DNA samples. We summarized the total number and proportion of reconciled GenBank TaxIDs by bird orders, and within the largest bird order Passerformes, by families. We also summarized the number of GenBank nucleotide sequences and number of reconciliations for each IUCN conservation status category. For a taxon that did not have a direct match to an IUCN name, we placed it under “Not Assessed”.GenBank sequences associated with avian namesWe tallied the number of core nucleotide sequences in GenBank associated with each taxonomic ID by downloading the “nucl_gb.accession2TaxID” file on 2 November 2020 (Data Repository D5). This file lists the accession number for each sequence in the GenBank nucleotide database and its corresponding taxonomic ID number. From this, we wrote a Perl script (Data Repository D6) to count the number of nucleotide sequences associated with each taxonomic ID corresponding to an avian taxonomic ID. To obtain counts of the number of runs in the NCBI Sequence Read Archive (SRA) associated with each bird species, we downloaded the “RunInfo” for the SRA runs (“SraRunInfo.csv”) within “Aves” on August 1, 2021 (Data Repository D7). To obtain counts of the number of genome sequences in GenBank associated with each name, we downloaded from NCBI on September 5, 2021 a summary of the NCBI Genome files (“genome_result.txt”) within “Aves” (Data Repository D8).Linking eBird/Clements names to geographic realmsFor TaxIDs that were successfully assigned to eBird/Clements species names (either by direct name match or taxonomic reconciliation), we delimited their geographic realms using the associated IOC breeding ranges (eight terrestrial realms and four oceanic realms). Here we implemented IOC, rather than eBird/Clements geographic information because eBird/Clements does not summarize species occurrence by geographic realm. We also manually assigned geographic realms for species without range information available in the IOC v10.1 checklist (master_ioc_list_v10.1.xlsx). We defined species that occur in only one realm as realm endemics, and species that occur in two or more realms as widespread. We then summarized the number of reconciliations and the number of GenBank nucleotide sequences for each realm, and widespread species.Linking eBird/Clements names to other databasesWe used audio data as an example to examine the extent to which name-reconciled GenBank sequences apply to large avian comparative databases, such as Macaulay Library and Xeno-canto. Since Macaulay Library uses eBird/Clements taxonomy for its bird images, audios and videos, we can readily link these media resources to the GenBank nucleotide data under the same eBird/Clements names. We downloaded a summary of available audio data (April 2021) from Macaulay Library (https://www.macaulaylibrary.org/resources/media-target-species/; Data Repository D9). We also examined Xeno-canto, a global avian vocalization database, which uses the IOC taxonomy. To match Xeno-canto’s 10,909 avian names to eBird/Clements names, we filtered out the species with a direct name match and then reconciled the remaining using Avibase taxonomic concepts. Lastly, we summed up the number of Xeno-canto sound recordings (October 2020; https://www.xeno-canto.org/collection/species/all; Data Repository D10) under the same eBird/Clements name. For example, the Xeno-canto name Colinus leucopogon had 26 sound recordings and Colinus cristatus had 57, but the eBird/Clements name C. cristatus would have 83, because C. leucopogon is treated as a subspecies of C. cristatus by eBird/Clements.
d
Data from: Unlocking natural history collections to improve eDNA reference...
datadryad.org
zip
Updated Oct 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Schmid; Nicolas Straube; Camille Albouy; Bo Delling; James Maclaine; Michael Matschiner; Peter Rask Møller; Annamaria Nocita; Anja Palandačić; Lukas Rüber; Moritz Sonnewald; Nadir Alvarez; Stéphanie Manel; Loïc Pellissier (2025). Unlocking natural history collections to improve eDNA reference databases and biodiversity monitoring [Dataset]. http://doi.org/10.5061/dryad.0zpc8677g
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.0zpc8677g
Dataset updated
Oct 16, 2025
Dataset provided by
Dryad
Authors
Sarah Schmid; Nicolas Straube; Camille Albouy; Bo Delling; James Maclaine; Michael Matschiner; Peter Rask Møller; Annamaria Nocita; Anja Palandačić; Lukas Rüber; Moritz Sonnewald; Nadir Alvarez; Stéphanie Manel; Loïc Pellissier
Time period covered
Dec 20, 2024
Description
Unlocking natural history collections to improve eDNA reference databases and biodiversity monitoring

Description of the data and file structure

The dataset consists of a main folder, data.zip.

Various

kit_custom_prices.xlsx - price estimate for DNA extraction and ssDNA library prep using a commercial kit or the custom protocol from Nicolas Straube.

barcodes_data

output from the cumul_barcodes_plot.R script.

species_with_barcodes.csv - list of all fishes (marine + freshwater) with a given barcode available, according to NCBI. (1) species name, (2) NCBI taxon ID, (3) date when the species sequence was first uploaded on NCBI, (4) marker of interest, (5) year the species sequence was first uploaded on NCBI.

occurence_data

contains a different type of list of species (museum, 12S availability, etc.)

combined_gbif_species.csv - output from the script museum_potential/1_process_gbif_datasets.R. Contains all the species of fish found in the main natural ...
Data from: Exome genotyping, linkage disequilibrium and population structure...
agdatacommons.nal.usda.gov
datasetcatalog.nlm.nih.gov
bin
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mengmeng Lu; Konstantin V. Krutovsky; C. Dana Nelson; Tomasz E. Koralewski; Thomas D. Byram; Carol A. Loopstra (2023). Data from: Exome genotyping, linkage disequilibrium and population structure in loblolly pine (Pinus taeda L.) [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Data_from_Exome_genotyping_linkage_disequilibrium_and_population_structure_in_loblolly_pine_Pinus_taeda_L_/24662259
Explore at:
binAvailable download formats
Dataset updated
Nov 30, 2023
Dataset provided by
National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
Authors
Mengmeng Lu; Konstantin V. Krutovsky; C. Dana Nelson; Tomasz E. Koralewski; Thomas D. Byram; Carol A. Loopstra
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Loblolly pine (Pinus taeda L.) is one of the most widely planted and commercially important forest tree species in the USA and worldwide, and is an object of intense genomic research. However, whole genome resequencing in loblolly pine is hampered by its large size and complexity and a lack of a good reference. As a valid and more feasible alternative, entire exome sequencing was hence employed to identify the gene-associated single nucleotide polymorphisms (SNPs) and to genotype the sampled trees. Resources in this dataset:Resource Title: Availability of supporting data. File Name: Web Page, url: https://doi.org/10.1186/s12864-016-3081-8 The data sets supporting the results of this article are included within the article and additional files. The raw SNP data and Illumina HiSeq short read sequences are deposited in the NCBI Single Nucleotide Polymorphism Database (dbSNP) (accession numbers ss1995911273-ss1996900602; http://www.ncbi.nlm.nih.gov/SNP) and Sequence Read Archive (SRA) (accession number SRP075763; http://www.ncbi.nlm.nih.gov/sra).
n
Data for: SNPs detected in pool-seq data from resistant and susceptible...
data.niaid.nih.gov
dataone.org
zip
Updated Mar 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chloé Haberkorn (2023). Data for: SNPs detected in pool-seq data from resistant and susceptible Cimex lectularius populations [Dataset]. http://doi.org/10.5061/dryad.9cnp5hqp6
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.9cnp5hqp6
Dataset updated
Mar 28, 2023
Dataset provided by
Laboratoire de Biométrie et Biologie Evolutive
Authors
Chloé Haberkorn
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
In the last few years, the bed bug Cimex lectularius has been an increasing problem world-wide, mainly due to the development of insecticide resistance to pyrethroids. The characterization of resistance alleles is a prerequisite to improve surveillance and resistance management. To identify genomic variants associated with pyrethroid resistance in Cimex lectularius, we compared the genetic composition of two recent and resistant populations with that of two ancientsusceptible strains using a genome-wide pool-seq design. We identified a large 6 Mb "superlocus" showing particularly high genetic differentiation and association with the resistance phenotype. This superlocus contained several clustered resistance genes, andwas also characterized by a high density of structural variants (inversions, duplications). The possibility that this superlocus constitute a resistance "supergene" that evolved after the clustering of alleles adapted to insecticide and after reduction in recombination is discussed. Methods The four strains used in this studywere provided by CimexStore Ltd (Chepstow, United Kingdom). Two of these strains were susceptible to pyrethroids (S), as they were collected before their massive use and have been maintained under laboratory condition without insecticide exposure for more than 40 years : German Lab (GL, collected in Monheim, Germany) and London Lab (LL, collected in London, Great Britain). The other two resistant (R) populations were London Field (LF, collected in 2008 in London) moderately resistant to pyrethroids, and Sweden Field (SF, collected in 2015 in Malm., Sweden), with a moderate-to-high resistance level. For each strain, genomic DNA was extracted from 30 individual females (except for London Lab which had only 28) using NucleoSpin 96 Tissue Kit (Macherey Nagel, Hoerdt, France) and eluated in 100 μL of BE buffer. DNA concentration of these samples was measured using Quant-iT PicoGreen Kit (ThermoFisher, Waltham MASS, USA) according to manufacturer’s instructions. Samples were then gathered with an equal DNA quantity into pools. DNA purification was performed for each pool with 1.8 times the sample volume in AMPure XP beads (Beckman Coulter, Fullerton CA, USA). Purified DNAwere retrieved in 100 μL of ultrapure water. Pool concentrations were measured with Qubit using DNA HS Kit (Agilent, Santa Clara CA, USA). Final pool concentrations were as follow: 38.5 ng/μL for London Lab, 41.6 ng/μL for London Field, 40.3 ng/μL for German Lab and 38 ng/μL for Sweden Field. Sequencing was performed using TruSeq Nano Kit (Illumina, San Diego CA, USA) to produce paired-end read of 2 x 150 bp length and a coverage of 25 X for London Lab, 32 X for London Field, 39.5 X for German Lab and 25.4 X for Sweden Field by Genotoul (Castanet-Tolosan, France). The whole pipeline with the detail of parameters used is available on GitHub (https://github.com/chaberko-lbbe/clec-poolseq). Quality control analysis of reads obtained from each line was performed using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). The raw data have been submitted to the Sequence Read Archive (SRA) database of NCBI under BioProject PRJNA826750. Sequencing reads were filtered using Trimmomatic software v0.39 (Bolger et al., 2014), which removes adaptors. FastUniq v1.1 was then used to remove PCR duplicates (Xu et al., 2012). Reads were mapped on the C. lectularius reference genome (Clec_2.1 assembly, Harlan strain) performed as part of the i5K project (Poelchau et al., 2015), with an estimated size of 510.83 Mb. Mapping was performed using BWA mem v0.7.4 (Li and Durbin, 2009). Sam files were converted to bam format using samtools v1.9, and cleaned of unmapped reads (Li et al., 2009). The 1573 nuclear scaffolds were kept in this analysis, while the mitochondrial scaffold was not considered. Bam files corresponding to the four populations were converted into mpileup format with samtools v1.9. The mpileup file was then converted to sync format by PoPoolation2 version 1201 (Kofler et al., 2011). 8.03 million (M) SNPs were detected on this sync file using R/poolfstat package v2.0.0 (Hivert et al., 2018) and the following parameters: coverage per pool between 10 and 50. Fixation indexes (FST) were computed with R/poolfstat for each pairwise population comparison of each SNP. Global SNP pool was then trimmed on minor allele frequency (MAF) of 0.2 (computed as MAF = 0.5 − |p − 0.5|, with p being the average frequency across all four populations). This relatively high MAF value was chosen in order to remove loci for which we have very limited power to detect any association with the resistance phenotype in the BayPass analysis. BayPass v2.3 (Olazcuaga et al., 2020) was used with default parameters. The final dataset was thus reduced to 2.92M SNPs located on 990 scaffolds.
g
Grass Carp diet and environmental data from the Lake Erie and Lake Michigan...
gimi9.com
Updated Sep 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Grass Carp diet and environmental data from the Lake Erie and Lake Michigan basins from 2019 to 2022 | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_grass-carp-diet-and-environmental-data-from-the-lake-erie-and-lake-michigan-basins-from-20/
Explore at:
Dataset updated
Sep 17, 2025
Area covered
Lake Erie, Michigan
Description
Data include metabarcoding read data from grass carp associated with the Great Lakes removal program from 2019 to 2022. The anterior portion of the digestive tract was carefully separated from other organs for each captured grass carp and stored in 180 proof non-denatured ethanol. A 25 ml of subsampled gut contents was used for DNA purification and sequencing using polymerase chain reactions. Samples were aligned with a custom ITS2 reference sequence database generated by downloading available ribosomal sequences from the NCBI Nucleotide database on February 5th, 2023. Algae were excluded from the database due to issues in extracting DNA from algae and minimal representation of cyanobacteria in DNA databases. Environmental covariates, such as discharge, number of road crossings, distance to cities, Lake Erie water levels, and temperature were collated based on the capture date of grass carp to assess relationships between diet and the environment. The data set includes grass carp capture and diet information and associated environmental variables. Short reads used for bioinformatic comparison obtained from the carp diets were deposited in the short read archive (SRA) associated with Bioproject PRJNA1304957, SRA Accessions SAMN50579620 - SAMN50580151.
e
Gene annotation of Blastobotrys mokoenaii, Blastobotrys illinoisensis, and...
data.europa.eu
unknown
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chalmers University of Technology (2025). Gene annotation of Blastobotrys mokoenaii, Blastobotrys illinoisensis, and Blastobotrys malaysiensis [Dataset]. https://data.europa.eu/data/datasets/https-doi-org-10-17044-scilifelab-28606814?locale=nl
Explore at:
unknownAvailable download formats
Dataset updated
Mar 20, 2025
Dataset authored and provided by
Chalmers University of Technology
Description
This dataset contains the gene annotation data for three species of Blastobotrys yeats: B. mokoenaii, B. illinoisensis, and B. malaysiensis.

The genome assemblies for B. mokoenaii (NRRL Y-27120) and B. malaysiensis (NRRL Y-6417) were publicly available on the National Center for Biotechnology Information (NCBI) under accessions GCA_003705765.3 and GCA_030558815.1, respectively.

The genome assembly for B. illinoisensis (NRRL YB-1343) was generated by SciLifeLab's National Genomics Infrastructure (NGI) using PacBio long-read data and deposited in the European Nucleotide Archive (ENA) under accession GCA_965113335.1.

File description- bmokoenaii_annotation.gff This file contains the gene models predicted for B. mokoenaii (GCA_003705765.3). - billinoisensis_annotation.gff This file contains the gene models predicted for B. illinoisensis (GCA_003705765.3). - bmalaysiensis_annotation.gff This file contains the gene models predicted for B. malaysiensis (GCA_030558815.1). Gene annotation methodsRepeat MaskingPrior to annotation, a repeat library was built for each species using RepeatModeler2 v2.0.2 and the genomes were soft-masked using RepeatMasker v4.1.5.

$ RepeatModeler -database ${DB} -engine ncbi -pa 16 $ RepeatMasker -dir . -gff -u -no_is -xsmall -e ncbi -lib ${LIBRARY} -pa 16 genome.fasta

Structural Annotation Structural annotation was performed on the soft-masked genomes using Braker3 v3.0.3 incorporating external evidence in the form of all fungal proteins from OrthoDB v11 (available at https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11).

$ braker.pl --genome="$genome" \

--prot_seq=${protein} --workingdir=${PWD}
--gff3 --threads=16 --verbosity=3
--nocleanup --species=${i}

Functional Annotation

The predicted genes were functionally annotated using the National Bioiformatics Infrastructure Sweden (NBIS) functional_annotation nextflow pipeline v2.0.0 (https://github.com/NBISweden/pipelines-nextflow). Briefly, this pipeline performs similarity searches between the annotated proteins and the UniProtKB/Swiss-Prot database (downloaded on 2023-12) using the Basic Local Alignment Search Tool (BLAST). Then it uses InterProScan to query the proteins against InterPro v59-91 databases, and merges results using AGAT v1.2.0.

tRNAs and rRNAs

Transfer RNA (tRNA) and ribosomal RNA (rRNA) genes were annotated using tRNAscan-SE v2.0.12 and barrnap v0.9, respectively. Other ncRNAs, such as SRP RNA, RNase P RNA, spliceosomal ncRNAs etc. have not been predicted. Finnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0.

$ tRNAscan-SE -E --gff ${output}_trnas.gff --thread 16 ${genome}.fasta $ barrnap --kingdom euk --threads 6 ${genome}.fasta > ${output}_rrna.gff

Annotation integrationFinnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0.

$ agat_sp_complement_annotations.pl --ref ${protein_coding} --add ${trna} --add ${rrna} --out full_annotation.gff
o
D-BeONE.1.2 BeONE dataset
openagrar.de
Updated Dec 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verónica Mixão; Holger Brendebach; Simon Tausch; Miguel Pinto; Carlus Deneke; Karin Lagesen; Vítor Borges (2021). D-BeONE.1.2 BeONE dataset [Dataset]. http://doi.org/10.5281/zenodo.7335590
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7335590
Dataset updated
Dec 30, 2021
Dataset provided by
INSA
NVI
BfR
Authors
Verónica Mixão; Holger Brendebach; Simon Tausch; Miguel Pinto; Carlus Deneke; Karin Lagesen; Vítor Borges
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In order to contribute to the accomplishment of specific objectives of the BeOne project, WP1-T2 compiled an anonymized dataset (including sequencing reads and respective metadata) aiming to capture the genomic diversity within the populations of Listeria monocytogenes, Salmonella enterica, Escherichia coli (STEC) and Campylobacter jejuni. This dataset counts with data shared by the BeOne partners and comprises a total of 3,884 isolates, from which the anonymized sequencing reads were released in the European Nucleotide Archive (ENA) and the anonymized genome assemblies in the Zenodo repository [1,426 L. monocytogenes (accession: PRJEB57166 and 10.5281/zenodo.7267486 ); 1,540 S. enterica (accession: PRJEB57179 and 10.5281/zenodo.7267785 ); 308 E. coli (accession: PRJEB57098 and10.5281/zenodo.726784 4); 610 C. jejuni (accession: PRJEB57119 and 10.5281/zenodo.7267879 )]. As a complement to the BeOne dataset, additional samples were carefully selected among the WGS data publicly available at the beginning of the analysis (November 2021) in ENA or the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA), in order to ensure the representativeness of the genomic diversity within public databases (assessed in terms of sequence type or serotype, depending on the species). In the end, a so-called “public dataset” with the 8,383 samples that passed the curation step was released in Zenodo repository [1,874 L. monocytogenes (accession: 10.5281/zenodo.7116878 ); 1,434 S. enterica (accession: 10.5281/zenodo.7119735 ), 1,999 E. coli (accession: 10.5281/zenodo.7120057 ); 3,076 C. jejuni (accession: 10.5281/zenodo.7120166 )].
n
Data from: Evidence supporting the microbiota-gut-brain axis in a songbird
data-staging.niaid.nih.gov
zip
Updated Oct 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Morgan Slevin; Jennifer Houtz; David Bradshaw II; Rindy Anderson (2020). Evidence supporting the microbiota-gut-brain axis in a songbird [Dataset]. http://doi.org/10.5061/dryad.8gtht76mc
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.8gtht76mc
Dataset updated
Oct 20, 2020
Dataset provided by
Cornell University
Florida Atlantic University
Authors
Morgan Slevin; Jennifer Houtz; David Bradshaw II; Rindy Anderson
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Recent research in mammals supports a link between cognitive ability and the gut microbiome, but little is known about this relationship in other taxa. In a captive population of 38 Zebra Finches (Taeniopygia guttata), we quantified performance on cognitive tasks measuring learning and memory. We sampled the gut microbiome via cloacal swab and quantified bacterial alpha and beta diversity. Performance on cognitive tasks related to beta diversity but not alpha diversity. We then identified differentially abundant genera influential in the beta diversity differences among cognitive performance categories. Though correlational, this study provides some of the first evidence of an avian microbiota-gut-brain axis, building foundations for future microbiome research in wild populations and during host development.

Methods Adult Zebra Finch gut microbiome data was assessed via cloacal swab. We extracted DNA using PowerSoil DNA Isolation Kits (Qiagen, Germany) following slight modification to manufacturer instructions, amplified the V4 region of the 16S rRNA gene using modified primers 515F/806R with Illumina adaptors following the Earth Microbiome Protocol for PCR, and submitted final pooled PCR products to Cornell’s Biotechnology Resource Center for quantification, normalization, library preparation, and sequencing. In total, we sequenced 72 cloacal swab samples, and14 negative controls in one Illumina MiSeq paired-end 2 x 250 bp run.

Using Quantitative Insights into Microbial Ecology 2 (QIIME2), raw sequences were trimmed of their primers, joined, per-nucleotide-quality-filtered, and denoised. Amplicon Sequence Variants (ASVs) were annotated using Scikit-learn system and the SILVA 132 database; mitochondria, chloroplasts, and unassigned sequences were filtered out. ASVs were aligned using MAFFT and masked to make a midpoint-rooted phylogenetic tree using FASTTREE. We decontaminated samples with package decontam in R (52) using negative controls and DNA yield. ASVs with <10 sequences across all samples were removed. Filtered sequences were CSS-normalized using package metagenomeSeq in R. Mean sequencing depth was 18758.5±1234.9 reads before decontamination, filtering, and normalization, and 380440.2±47429.3 reads afterwards. Raw sequences were submitted to NCBI’s Sequence Read Archive (BioProject PRJNA636961). Snakemake files (pre-configured coding loops) used for sequence analysis, and R scripts for statistical analysis, are available on github: (https://github.com/djbradshaw2/General_16S_Amplicon_Sequencing_Analysis).

For comparison to microbiome characteristics, we quantified cognitive performance on three tasks that measure learning and memory: a novel foraging task, a color association task, and a color reversal task, with birds first presented with a neophobia test (latency to approach the foraging grid used in cognitive tasks. Briefly, The novel foraging task employs operant conditioning and stepwise shaping to teach a novel foraging technique: birds learned to pry opaque blue and white lids from wells to obtain a seed reward (same seed as regular diet). The performance measure was the number of trials required to learn to remove lids to obtain the reward. Once birds mastered prying lids from wells, they were again presented with blue and white lids, but only one color was rewarded. This color association task tests associative learning: the ability to form a mental connection between multiple stimuli. The performance measure was the number of trials required to learn to remove the rewarded lids first before any unrewarded lids. Finally, the color reversal task (the rewarded color is switched) is also an associative learning task, but also tests for behavioral flexibility. Performance was measured as the number of trials to stop removing the formerly rewarded lid color and instead remove the newly rewarded color. For all tasks, cognitive performance is an inverted variable: a low number of trials required to pass signifies high performance. Subjects were tested after 4 hours of food restriction, ensuring motivation to obtain food rewards. Each bird was tested individually (visually but not acoustically isolated from other test subjects) for 4 hours each day, consisting of eight two-min trials separated by 20 min. We viewed and scored trials remotely via video. Continued motivation to eat was confirmed at the end of each test day by returning the normal full seed dish and observing the bird’s latency to eat.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). DDBJ Sequence Read Archive [Dataset]. http://identifiers.org/RRID:SCR_001370/resolver/mentions?q=&i=rrid

Data from: DDBJ Sequence Read Archive

RRID:SCR_001370, r3d100013696, OMICS_01027, nlx_152515, DDBJ Sequence Read Archive (RRID:SCR_001370), DRA

Explore at:

25 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://identifiers.org/RRID:SCR_001370 https://identifiers.org/RRID:SCR_001370/resolver/mentions?q=&i=rrid

Dataset updated

Sep 9, 2024

Description

Archive database for output data generated by next-generation sequencing machines including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, and others. DRA is a member of the International Nucleotide Sequence Database Collaboration (INSDC) and archiving the data in a close collaboration with NCBI Sequence Read Archive (SRA) and EBI Sequence Read Archive (ERA). Please submit the trace data from conventional capillary sequencers to DDBJ Trace Archive., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.

Clear search

Close search

Google apps

Main menu

Data from: DDBJ Sequence Read Archive

European Nucleotide Archive (ENA)

Innuendo Whole Genome And Core Genome Mlst Schemas And Datasets For Yersinia...

Data from: List of size fractionated eukaryotic plankton community samples...

Investigation of Psylliodes chrysocephala aestivation by RNA-seq

Data from: Updating splits, lumps, and shuffles: Reconciling GenBank names...

Data from: Unlocking natural history collections to improve eDNA reference...

Unlocking natural history collections to improve eDNA reference databases and biodiversity monitoring

Description of the data and file structure

Data from: Exome genotyping, linkage disequilibrium and population structure...

Data for: SNPs detected in pool-seq data from resistant and susceptible...

Grass Carp diet and environmental data from the Lake Erie and Lake Michigan...

Gene annotation of Blastobotrys mokoenaii, Blastobotrys illinoisensis, and...

D-BeONE.1.2 BeONE dataset

Data from: Evidence supporting the microbiota-gut-brain axis in a songbird

Data from: DDBJ Sequence Read ArchiveSee More Versions

RRID:SCR_001370, r3d100013696, OMICS_01027, nlx_152515, DDBJ Sequence Read Archive (RRID:SCR_001370), DRA

Data from: DDBJ Sequence Read Archive