54 datasets found
  1. NCBI Trace Archive

    • integbio.jp
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NCBI (National Center for Biotechnology Information), NCBI Trace Archive [Dataset]. https://integbio.jp/dbcatalog/en/record/nbdc01944
    Explore at:
    Dataset provided by
    National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
    Description

    The Trace Archives includes the following archives: The Sequence Read Archive (SRA) stores raw sequence data from "next-generation" sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. The Trace Archive serves as the repository of sequencing data from gel/capillary platforms such as Applied Biosystems ABI 3730. The Trace Assembly Archive stores pairwise alignment and multiple alignment of sequencing reads, linking basic trace data with finished genomic sequence as found in GenBank.

  2. e

    Catalog of NCBI sequence read archive (SRA) data for salamanders at the...

    • portal.edirepository.org
    csv
    Updated Apr 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Catalog of NCBI sequence read archive (SRA) data for salamanders at the Hubbard Brook Experimental Forest 2012-2021 [Dataset]. https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-hbr&identifier=398
    Explore at:
    csv(220695 byte), csv(312227 byte), csv(282251 byte)Available download formats
    Dataset updated
    Apr 9, 2024
    Dataset provided by
    EDI
    Authors
    Brett Addis; Madaline Cochrane; Winsor Lowe
    Time period covered
    2012 - 2021
    Area covered
    Variables measured
    strain, ecotype, isolate, lat_lon, cultivar, organism, Accession, BioProject, env_medium, sample_URL, and 8 more
    Description

    This project was designed to describe fine-scale population genetic differentiation of the stream salamander Gryinophilus porphyriticus among five study streams in the Hubbard Brook Experimental Forest. The data are paired with intensive capture-recapture data to assess direct fitness effects of individual genetic diversity, including effects of individual multilocus heterozygosity on stage-specific survival probabilities.

       This dataset publishes a manifest of the genomic sequence reads submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). These samples are published at NCBI under the BioProject ID 1090913 (https://www.ncbi.nlm.nih.gov/bioproject/1090913). The tables here include sample metadata and the NCBI URLs to each sample.
    
       These data were gathered as part of the Hubbard Brook Ecosystem Study (HBES). The HBES is a collaborative effort at the Hubbard Brook Experimental Forest, which is operated and maintained by the USDA Forest Service, Northern Research Station.
    
  3. d

    NCBI Sequence Read Archive (SRA) accession numbers for fastq sequence files...

    • dataone.org
    • search.dataone.org
    Updated Mar 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erica Goetze (2025). NCBI Sequence Read Archive (SRA) accession numbers for fastq sequence files for each zooplankton community sample (Plankton Population Genetics project) [Dataset]. http://doi.org/10.1575/1912/bco-dmo.704665
    Explore at:
    Dataset updated
    Mar 9, 2025
    Dataset provided by
    Biological and Chemical Oceanography Data Management Office (BCO-DMO)
    Authors
    Erica Goetze
    Time period covered
    Jun 13, 2014 - Jun 19, 2014
    Area covered
    Description

    These data include sample information and accession links to sequence data at The National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).

    This data submission consists of metabarcoding data for the zooplankton community in the epipelagic, mesopelagic and upper bathypelagic zones (0-1500m) of the North Pacific Subtropical Gyre. The goal of this study was to assess the hidden diversity present in zooplankton assemblages in midwaters, and detect vertical gradients in species richness, depth distributions, and community composition of the full zooplankton assemblage. Samples were collected in June 2014 from Station ALOHA (22.75, -158) using a 1 meter square Multiple Opening and Closing Nets and Environmental Sampling System (MOCNESS, 200um mesh), on R/V Falkor cruise FK140613. Next generation sequence data (Illumina MiSeq, V3 chemistry, 300-bp paired-end) of the zooplankton assemblage derive from amplicons of the V1-V2 region of 18S rRNA (primers described in Fonseca et al. 2010). The data includes sequences and read count abundance information for molecular OTUs from both holoplanktonic and meroplanktonic taxa

    Related dataset containing OTU tables and fasta sequences (representative / most abundance read for each OTU):
    Metabarcoding zooplankton at station ALOHA: OTU tables and fasta files

  4. The OHEJP BeONE Project – Escherichia coli genome assembly dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI (2023). The OHEJP BeONE Project – Escherichia coli genome assembly dataset [Dataset]. http://doi.org/10.5281/zenodo.7267845
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies of 308 Escherichia coli samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7120057), comprising genome assemblies of 1,999 E. coli samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).

    File “BeONE_Ec_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers, in-silico Multi Locus Sequence Type and Serotype.

    The archive “BeONE_Ec_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    Dataset selection and curation

    This anonymized dataset of E. coli genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57098. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 308 isolates passed the dataset curation step and were included in the final dataset. In-silico serotyping was performed with seq_typing v2.2.

    Funding

    This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.

  5. d

    Sample collection information and sequence accessions at the National Center...

    • search.dataone.org
    • bco-dmo.org
    • +1more
    Updated Mar 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John J. Stachowicz (2025). Sample collection information and sequence accessions at the National Center for Biotechnology Information (NCBI) for whole genome sequencing of eelgrass (Zostera marina) collected at Bodega and Tomales Bay, CA, USA from July to September 2019 [Dataset]. https://search.dataone.org/view/sha256%3Ad19f54c6d0afe04071aea24c0b6e1cec4b1bc13161f061822b35b92a253fe865
    Explore at:
    Dataset updated
    Mar 9, 2025
    Dataset provided by
    Biological and Chemical Oceanography Data Management Office (BCO-DMO)
    Authors
    John J. Stachowicz
    Time period covered
    Jul 16, 2019 - Sep 30, 2019
    Area covered
    Description

    This dataset includes sample collection information and sequence accessions at the National Center for Biotechnology Information (NCBI) for whole genome sequencing of eelgrass (Zostera marina) collected at Bodega and Tomales Bay, California, USA from July and September of 2019. Sequence Read Archive (SRA) Experiments and BioSamples can be accessed from the NCBI BioProject PRJNA887384 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA887384/).

    Results summary as described in Scheibelhut, et al. (2023): We examine genomic signals of selection in the eelgrass Zostera marina across temperature gradients in adjacent embayments. Although we find many genomic regions with signals of selection within each bay there is very little overlap in signals of selection at the SNP level, despite most polymorphisms being shared across bays. We do find overlap at the gene level, potentially suggesting multiple mutational pathways to the same phenotype. Using polygenic models we find that some sets of candidate SNPs are able to predict temperature across both bays, suggesting that small but parallel shifts in allele frequencies may be missed by independent genome scans. Together, these results highlight the continuous rather than binary nature of parallel evolution in polygenic traits and the complexity of evolutionary predictability.

  6. The OHEJP BeONE Project – Listeria monocytogenes genome assembly dataset

    • zenodo.org
    • explore.openaire.eu
    • +2more
    bin, zip
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI (2023). The OHEJP BeONE Project – Listeria monocytogenes genome assembly dataset [Dataset]. http://doi.org/10.5281/zenodo.7267487
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies of 1,426 Listeria monocytogenes samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7116878), comprising genome assemblies of 1,874 L. monocytogenes samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).

    File “BeONE_Lm_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers and in-silico Multi Locus Sequence Type.

    The archive “BeONE_Lm_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    Dataset selection and curation

    This anonymized dataset of L. monocytogenes genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57166. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,426 isolates passed the dataset curation step and were included in the final dataset.

    Funding

    This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.

  7. d

    Whole genome sequencing of three North American large-bodied birds

    • datasets.ai
    • data.usgs.gov
    • +2more
    55
    Updated Sep 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of the Interior (2024). Whole genome sequencing of three North American large-bodied birds [Dataset]. https://datasets.ai/datasets/whole-genome-sequencing-of-three-north-american-large-bodied-birds
    Explore at:
    55Available download formats
    Dataset updated
    Sep 11, 2024
    Dataset authored and provided by
    Department of the Interior
    Description

    The data release details the samples, methods, and raw data used to generate high-quality genome assemblies for greater sage-grouse (Centrocercus urophasianus), white-tailed ptarmigan (Lagopus leucura), and trumpeter swan (Cygnus buccinator). The raw data have been deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI), the authoritative repository for public biological sequence data, and are not included in this data release. Instead, the accessions that link to those data via the NCBI portal (www.ncbi.nlm.nih.gov) are provided herein. The release consists of a single file, sample.metadata.txt, which maps NCBI accessions to the samples sequenced and the different types of sequencing performed to generate the assemblies and annotate their gene features.

  8. The OHEJP BeONE Project – Campylobacter jejuni genome assembly dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI (2023). The OHEJP BeONE Project – Campylobacter jejuni genome assembly dataset [Dataset]. http://doi.org/10.5281/zenodo.7802717
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies of 610 Campylobacter jejuni samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7120166), comprising genome assemblies of 3,076 C. jejuni samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).

    File “BeONE_Cj_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers and in-silico Multi Locus Sequence Type, and information regarding year of sampling, country and source.

    The archive “BeONE_Cj_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    Dataset selection and curation

    This anonymized dataset of C. jejuni genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57119. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 610 isolates passed the dataset curation step and were included in the final dataset.

    Funding

    This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.

    Acknowledgements

    We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.

  9. 1000 Cannabis Genomes Project

    • kaggle.com
    zip
    Updated Feb 26, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). 1000 Cannabis Genomes Project [Dataset]. https://www.kaggle.com/bigquery/genomics-cannabis
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 26, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Cannabis is a genus of flowering plants in the family Cannabaceae.

    Source: https://en.wikipedia.org/wiki/Cannabis

    Content

    In October 2016, Phylos Bioscience released a genomic open dataset of approximately 850 strains of Cannabis via the Open Cannabis Project. In combination with other genomics datasets made available by Courtagen Life Sciences, Michigan State University, NCBI, Sunrise Medicinal, University of Calgary, University of Toronto, and Yunnan Academy of Agricultural Sciences, the total amount of publicly available data exceeds 1,000 samples taken from nearly as many unique strains.

    https://medium.com/google-cloud/dna-sequencing-of-1000-cannabis-strains-publicly-available-in-google-bigquery-a33430d63998

    These data were retrieved from the National Center for Biotechnology Information’s Sequence Read Archive (NCBI SRA), processed using the BWA aligner and FreeBayes variant caller, indexed with the Google Genomics API, and exported to BigQuery for analysis. Data are available directly from Google Cloud Storage at gs://gcs-public-data--genomics/cannabis, as well as via the Google Genomics API as dataset ID 918853309083001239, and an additional duplicated subset of only transcriptome data as dataset ID 94241232795910911, as well as in the BigQuery dataset bigquery-public-data:genomics_cannabis.

    All tables in the Cannabis Genomes Project dataset have a suffix like _201703. The suffix is referred to as [BUILD_DATE] in the descriptions below. The dataset is updated frequently as new releases become available.

    The following tables are included in the Cannabis Genomes Project dataset:

    Sample_info contains fields extracted for each SRA sample, including the SRA sample ID and other data that give indications about the type of sample. Sample types include: strain, library prep methods, and sequencing technology. See SRP008673 for an example of upstream sample data. SRP008673 is the University of Toronto sequencing of Cannabis Sativa subspecies Purple Kush.

    MNPR01_reference_[BUILD_DATE] contains reference sequence names and lengths for the draft assembly of Cannabis Sativa subspecies Cannatonic produced by Phylos Bioscience. This table contains contig identifiers and their lengths.

    MNPR01_[BUILD_DATE] contains variant calls for all included samples and types (genomic, transcriptomic) aligned to the MNPR01_reference_[BUILD_DATE] table. Samples can be found in the sample_info table. The MNPR01_[BUILD_DATE] table is exported using the Google Genomics BigQuery variants schema. This table is useful for general analysis of the Cannabis genome.

    MNPR01_transcriptome_[BUILD_DATE] is similar to the MNPR01_[BUILD_DATE] table, but it includes only the subset transcriptomic samples. This table is useful for transcribed gene-level analysis of the Cannabis genome.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    Dataset Source: http://opencannabisproject.org/ Category: Genomics Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://www.ncbi.nlm.nih.gov/home/about/policies.shtml - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. Update frequency: As additional data are released to GenBank View in BigQuery: https://bigquery.cloud.google.com/dataset/bigquery-public-data:genomics_cannabis View in Google Cloud Storage: gs://gcs-public-data--genomics/cannabis

    Banner Photo by Rick Proctor from Unplash.

    Inspiration

    Which Cannabis samples are included in the variants table?

    Which contigs in the MNPR01_reference_[BUILD_DATE] table have the highest density of variants?

    How many variants does each sample have at the THC Synthase gene (THCA1) locus?

  10. The OHEJP BeONE Project – Salmonella enterica genome assembly dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI (2023). The OHEJP BeONE Project – Salmonella enterica genome assembly dataset [Dataset]. http://doi.org/10.5281/zenodo.7802723
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies of 1,540 Salmonella enterica samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7119735), comprising genome assemblies of 1,434 S. enterica samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).

    File “BeONE_Se_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers, in-silico Multi Locus Sequence Type and Serotype, and information regarding year of sampling, country and source.

    The archive “BeONE_Se_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    Dataset selection and curation

    This anonymized dataset of S. enterica genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57179. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,540 isolates passed the dataset curation step and were included in the final dataset. In-silico serotyping was performed with SeqSero2 v1.2.1 (Zhang et al. 2019).

    Funding

    This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.

    Acknowledgements

    We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.

  11. d

    Metabarcode sequencing of aquatic environmental DNA from the Potomac River...

    • catalog.data.gov
    • gimi9.com
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Metabarcode sequencing of aquatic environmental DNA from the Potomac River Watershed, 2015-2020 [Dataset]. https://catalog.data.gov/dataset/metabarcode-sequencing-of-aquatic-environmental-dna-from-the-potomac-river-watershed-2015-
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Potomac River
    Description

    Biological indicator taxa have long been used for integrative assessments of water quality, particularly benthic invertebrate groups such as arthropods. While standardized protocols have been developed to calculate 'biological index' scores based on the abundances of these taxa, such systems are challenging to implement at large scales due to the sampling effort required, taxonomic expertise needed, and the need for repeated sampling to reliably discriminate sites. Many of the same taxa detected by traditional surveys can also be detected by genetic analysis of environmental DNA (eDNA), potentially allowing for an alternative formulation of biological indexes that might be faster and more economical to produce. The current data were produced to evaluate eDNA-derived biological indexes at sites within the Potomac River watershed of the eastern United States, specifically within units of the National Park Service for which previous biological assessment data were available. This data release consists of five files: 1. sample.metadata.txt, which contains sampling metadata and identifiers linking to sample-derived sequence data that has been deposited in the Sequence Read Archive of the National Center for Biotechnology Information (NCBI). This database is authoritative and comprehensive for sharing high-throughput sequence data produced with public funds. All accessions listed in the file can be searched to retrieve sample and sequence information at www.ncbi.nlm.nih.gov. 2. cox1.references.fasta, which contains reference sequences of the cytochrome c oxidase 1gene of arthropods (typically abbreviated cox1 or COI), identified from regional checklists. The file is a text file in FASTA format. 3. mt16S.references.fasta, which contains reference sequences of the mitochondrial 16S ribosomal RNA (mt16S) gene of arthropods identified from regional checklists. The file is a text file in FASTA format. 4. first.stage.counts.txt, which is a tab-delimited table of counts of sequences that are attributed to each taxon from each sample for the first stage of the study. Whether the taxon attribution is from the mt16S or cox1 locus is also indicated. 5. second.stage.counts.txt, which is a tab-delimited table of counts of sequences that are attributed to each taxon from each sample for the second stage of the study. Whether the taxon attribution is from the mt16S or cox1 locus is also indicated.

  12. m

    NCBI accession metadata for 18S rRNA gene tag sequences from DNA and RNA...

    • darchive.mblwhoilibrary.org
    • bco-dmo.org
    • +1more
    pdf, text/tsv, txt +2
    Updated Jul 24, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah K Hu; David Caron (2019). NCBI accession metadata for 18S rRNA gene tag sequences from DNA and RNA from samples collected in coastal California in 2013 and 2014 [Dataset]. https://darchive.mblwhoilibrary.org/entities/publication/438f7d51-f9e5-5c8d-b797-b10f4b04156a
    Explore at:
    pdf, xml, text/tsv, zip, txtAvailable download formats
    Dataset updated
    Jul 24, 2019
    Dataset provided by
    Biological and Chemical Oceanography Data Management Office (BCO-DMO). Contact: bco-dmo-data@whoi.edu
    Authors
    Sarah K Hu; David Caron
    Area covered
    Description

    NSF Division of Ocean Sciences (NSF OCE) OCE-1737409

  13. c

    Metagenomic detection and reconstruction of Lake Sinai Virus from honey bee...

    • s.cnmilf.com
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Metagenomic detection and reconstruction of Lake Sinai Virus from honey bee sequence data [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/metagenomic-detection-and-reconstruction-of-lake-sinai-virus-from-honey-bee-sequence-data
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Lake Sinai Township
    Description

    A survey of public honey bee sequence data was performed to detect infections by Lake Sinai Virus (LSV). The Sequence Read Archive of the National Center for Biotechnology Information (NCBI) was queried to identify accessions of RNA sequence data derived from honey bee. These were filtered as described below and then up to 50 million reads or read pairs were downloaded and searched against a reference database of conserved LSV sequence. Accessions with matches above a specified threshold were downloaded in their entirety and assembled into longer contiguous sequences (contigs). The result contigs were searched against each open reading frame (ORF) of the reference LSV genome present in the NCBI database (accession NC_032433.1) and matching regions from each contig. These ORF sequences were aligned with additional sequences identified in NCBI databases through the BLAST web service. These alignments provide the basis for computing phylogenetic trees, rates of nucleotide substitution, codon usage bias, and other evolutionary parameters.

  14. Z

    Genome assemblies and respective wg/cgMLST profiles of a diverse dataset...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jul 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Tausch (2023). Genome assemblies and respective wg/cgMLST profiles of a diverse dataset comprising 1,434 Salmonella enterica isolates [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7119735
    Explore at:
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Daniel Sobral
    Verónica Mixão
    Miguel Pinto
    Carlus Deneke
    Simon Tausch
    Vítor Borges
    João Paulo Gomes
    Holger Brendebach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies and respective 8,558-loci whole-genome (wg) Multiple Locus Sequence Type (MLST) profiles [INNUENDO schema (Llarena et al. 2018) available in chewie-NS (Mamede et al. 2022)] of a final set of 1,434 Salmonella enterica samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) at the beginning of the analysis (November 2021). This set of samples was carefully selected to cover a wide genetic diversity (assessed in terms of serotype). In total, 125 different serotypes are represented in this dataset, with Typhimurium (including monophasic), Enteritidis and Infantis being the most represented ones and, together, corresponding to 56.2% of the dataset.

    File “Se_metadata.xlsx” contains metadata information for each isolate, including ENA/SRA accession number, BioProject and in-silico MLST ST and serotype.

    The directory “assemblies/” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    The file “profiles/Se_profiles_wgMLST.tsv” corresponds to a tab separated file with the 8,558-loci wgMLST profiles of each isolate presented in the metadata file. The files “profiles/Se_profiles_cgMLST_95.tsv”, “profiles/Se_profiles_cgMLST_98.tsv” and “profiles/Se_profiles_cgMLST_100.tsv” correspond to a 3,261-loci, 3,179-loci and 874-loci cgMLST profiles of each isolate presented in the metadata file, respectively. These profiles were determined as explained below.

    Dataset selection and curation

    With the objective of creating a diverse dataset of S. enterica genome assemblies, we collected information about the genetic diversity (serotype) of the isolates available at Enterobase database in the beginning of this analysis (November 2021) and in other previous works. Based on this information, we selected an initial dataset comprising 1,779 samples associated with four BioProjects (PRJEB16326, PRJEB20997, PRJEB30335 and PRJEB39988). Their WGS data was downloaded from ENA/SRA with fastq-dl v1.0.6. Read quality control, trimming and assembly were performed with the Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,434 isolates passed this curation step and were included in the final dataset. In-silico serotyping was performed with SeqSero2 v1.2.1 (Zhang et al. 2019). wgMLST profiles of each of these isolates were determined with chewBBACA v2.8.5 (Silva et al. 2018), using the 8,558-loci INNUENDO schema available in chewie-NS (Llarena et al. 2018; Mamede et al. 2022) and downloaded on May 31st, 2022. Three cgMLST schemas were obtained with ReporTree v1.0.0 (Mixão et al. 2022) using the 8,558-loci wgMLST profiles of the 1,434 isolates as input and setting distinct “--site-inclusion” thresholds: 0.95, 0.98 and 1.0 (i.e., keep schema loci called in at least 95%, 98% and 100% of the samples, resulting in a 3,261-loci, 3,179-loci and 874-loci allelic matrices, respectively).

    Acknowledgements

    We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.

  15. d

    Data relating to RNA sequence accessions at NCBI from Ross Sea...

    • search.dataone.org
    • bco-dmo.org
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca J. Gast (2021). Data relating to RNA sequence accessions at NCBI from Ross Sea Dinoflagellates, Phaeocystis antarctica, Pyramimons tychotreta, and Micromonas polaris (CCMP 2099) (Kleptoplasty project) [Dataset]. https://search.dataone.org/view/http%3A%2F%2Flod.bco-dmo.org%2Fid%2Fdataset%2F728427
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Biological and Chemical Oceanography Data Management Office (BCO-DMO)
    Authors
    Rebecca J. Gast
    Time period covered
    Dec 1, 1997 - Apr 7, 1998
    Area covered
    South Pacific Ocean, Pacific Ocean
    Description

    This dataset contains data related to RNA sequence genetic accessions at the National Center for Biotechnology Information (NCBI) including information about the host organism, collection location, and collection date.

    The accessions are the unprocessed Illumina MiSeq reads for the Ross Sea Dinoflagellate RNA-Seq experiments, Phaeocystis antarctica RNA-Seq experiments, and Pyramimons tychotreta & Micromonas polaris (CCMP 2099) mixotrophy experiments.

    Pyramimonas tychotreta & Micromonas polaris (CCMP 2099) mixotrophy RNA sequences are available through the NCBI Sequence Read Archive (SRA) under the SRA accession number SRP090401 (BioProject PRJNA342459)

    Ross Sea Dinoflagellate RNA sequences are available through the NCBI Sequence Read Archive (SRA) under the accession number SRP132912 (BioProject PRJNA428208).

    Phaeocystis antarctica RNA sequences are available through the NCBI Sequence Read Archive (SRA) under the accession number SRP133243 (BioProject PRJNA434497).

  16. Z

    Genome assemblies and respective wg/cgMLST profiles of a diverse dataset...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Tausch (2023). Genome assemblies and respective wg/cgMLST profiles of a diverse dataset comprising 1,999 Escherichia coli isolates [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7120057
    Explore at:
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Daniel Sobral
    Verónica Mixão
    Miguel Pinto
    Carlus Deneke
    Simon Tausch
    Vítor Borges
    João Paulo Gomes
    Holger Brendebach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies and respective 7,601-loci whole-genome (wg) Multiple Locus Sequence Type (MLST) profiles [INNUENDO schema (Llarena et al. 2018) available in chewie-NS (Mamede et al. 2022)] of a final set of 1,999 Escherichia coli samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) at the beginning of the analysis (November 2021). This set of samples was carefully selected to cover a wide genetic diversity (assessed in terms of serotype). In total, 411 different serotypes are represented in this dataset, with O157:H7 being the most represented one, corresponding to 37.1% of the dataset.

    File “Ec_metadata.xlsx” contains metadata information for each isolate, including ENA/SRA accession number, BioProject and in-silico MLST ST and serotype.

    The directory “assemblies/” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    The file “profiles/Ec_profiles_wgMLST.tsv” corresponds to a tab separated file with the 7,601-loci wgMLST profiles of each isolate presented in the metadata file. The files “profiles/Ec_profiles_cgMLST_95.tsv”, “profiles/Ec_profiles_cgMLST_98.tsv” and “profiles/Ec_profiles_cgMLST_100.tsv” correspond to a 2,826-loci, 2,704-loci and 465-loci cgMLST profiles of each isolate presented in the metadata file, respectively. These profiles were determined as explained below.

    Dataset selection and curation

    With the objective of creating a diverse dataset of E. coli genome assemblies, we collected information about the genetic diversity (serotype) of the isolates available at Enterobase database in the beginning of this analysis (November 2021) and in other previous works. Based on this information, we selected an initial dataset comprising 2,688 samples associated with three BioProjects (PRJNA230969, PRJEB27020 and PRJNA248042). Their WGS data was downloaded from ENA/SRA with fastq-dl v1.0.6. Read quality control, trimming and assembly were performed with the Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,999 isolates passed this curation step and were included in the final dataset. In-silico serotyping was performed with seq_typing v2.2. wgMLST profiles of each of these isolates were determined with chewBBACA v2.8.5 (Silva et al. 2018), using the 7,601-loci INNUENDO schema available in chewie-NS (Llarena et al. 2018; Mamede et al. 2022) and downloaded on May 31st, 2022. Three cgMLST schemas were obtained with ReporTree v1.0.0 (Mixão et al. 2022) using the 7,601-loci wgMLST profiles of the 1,999 isolates as input and setting distinct “--site-inclusion” thresholds: 0.95, 0.98 and 1.0 (i.e., keep schema loci called in at least 95%, 98% and 100% of the samples, resulting in a 2,826-loci, 2,704-loci and 465-loci allelic matrices, respectively).

  17. d

    Coral gene expression Sequence Read Archive (SRA) accession numbers and...

    • search.dataone.org
    • darchive.mblwhoilibrary.org
    • +1more
    Updated Mar 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah W. Davies (2025). Coral gene expression Sequence Read Archive (SRA) accession numbers and information for samples collected at the Flower Garden Banks National Marine Sanctuary in the Gulf of Mexico in September and October of 2017 to capture effects of Hurricane Harvey [Dataset]. https://search.dataone.org/view/sha256%3Adc277481f592102ecdb5e6c56d30acd633733ed8971fc5e9b1481c81551a9bab
    Explore at:
    Dataset updated
    Mar 9, 2025
    Dataset provided by
    Biological and Chemical Oceanography Data Management Office (BCO-DMO)
    Authors
    Sarah W. Davies
    Time period covered
    Sep 1, 2017 - Oct 1, 2017
    Area covered
    Description

    To capture the immediate effects of storm-driven freshwater runoff on coral and symbiont physiology, we leveraged the heavy rainfall associated with Hurricane Harvey in late August 2017 by sampling FGB coral gene expression at two time points: September 2017, when surface water salinity was reduced (∼34 ppt); and 1 month later when salinity had returned to typical levels (∼36 ppt in October 2017).

    This dataset includes Sequence Read Archive (SRA) and BioSample accessions under BioProject PRJNA552981 at The National Center for Biotechnology Information. It also contains sample information and species names for samples collected the east and west banks of the Flower Garden Banks National Marine Sanctuary (FGBNMS) at 80ft.

    These data were published in Wright et al. (2019).

  18. c

    Reduced representation sequencing and genotyping of Arizona Toads (Anaxyrus...

    • s.cnmilf.com
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Reduced representation sequencing and genotyping of Arizona Toads (Anaxyrus microscaphus) from the southwestern United States [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/reduced-representation-sequencing-and-genotyping-of-arizona-toads-anaxyrus-microscaphus-fr
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Southwestern United States, United States
    Description

    The dataset consists of genotypes (diploid base calls at variant sites) at 3,601 anonymous sites of the Arizona Toad (Anaxyrus microscaphus) nuclear genome. The genotyped samples are representative of the range of the species and its major population units, and the genotyped loci have a high degree of completeness. This data release consists of several files: 1. sample.metadata.txt, which contains sampling metadata and identifiers linking to sequence data that has been deposited in the Sequence Read Archive of the National Center for Biotechnology Information (NCBI). This database is authoritative and comprehensive for sharing high-throughput sequence data produced with public funds. All NCBI-derived accessions listed in the file can be searched at www.ncbi.nlm.nih.gov to retrieve sample and sequence information, as can the umbrella BioProject accession PRJNA995169 2. genotypes.genepop.txt, which contains inferred genetic variants in a common and convertible text-based format

  19. m

    Sample information and genetic accession information for raw low-coverage...

    • darchive.mblwhoilibrary.org
    • search.dataone.org
    • +1more
    csv, pdf, xml
    Updated Jan 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Biological and Chemical Oceanography Data Management Office (BCO-DMO). Contact: bco-dmo-data@whoi.edu (2024). Sample information and genetic accession information for raw low-coverage genomic sequence reads from 248 different Atlantic silverside (Menidia menidia) collected along the east coast of North America between 2005 to 2007 [Dataset]. https://darchive.mblwhoilibrary.org/entities/publication/da6c3e96-3253-41ec-86d2-1ebfb6e24031
    Explore at:
    pdf, xml, csvAvailable download formats
    Dataset updated
    Jan 2, 2024
    Dataset provided by
    Biological and Chemical Oceanography Data Management Office (BCO-DMO). Contact: bco-dmo-data@whoi.edu
    Area covered
    Description

    Dataset: Raw low-coverage whole genome sequencing reads

  20. Data from: Raptor roosts as invasion archives: insights from the first black...

    • agdatacommons.nal.usda.gov
    bin
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Puerto Rico Mayaguez (2025). Raptor roosts as invasion archives: insights from the first black rat mitochondrial genome sequenced from the Caribbean [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Raptor_roosts_as_invasion_archives_insights_from_the_first_black_rat_mitochondrial_genome_sequenced_from_the_Caribbean/25089047/1
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
    Authors
    University of Puerto Rico Mayaguez
    License

    https://rightsstatements.org/vocab/UND/1.0/https://rightsstatements.org/vocab/UND/1.0/

    Area covered
    Caribbean
    Description

    Raptor roosts as invasion archives: insights from the first black rat mitochondrial genome sequenced from the Caribbean

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
NCBI (National Center for Biotechnology Information), NCBI Trace Archive [Dataset]. https://integbio.jp/dbcatalog/en/record/nbdc01944
Organization logo

NCBI Trace Archive

Explore at:
Dataset provided by
National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
Description

The Trace Archives includes the following archives: The Sequence Read Archive (SRA) stores raw sequence data from "next-generation" sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. The Trace Archive serves as the repository of sequencing data from gel/capillary platforms such as Applied Biosystems ABI 3730. The Trace Assembly Archive stores pairwise alignment and multiple alignment of sequencing reads, linking basic trace data with finished genomic sequence as found in GenBank.

Search
Clear search
Close search
Google apps
Main menu