75 datasets found
  1. w

    GenBank

    • data.wu.ac.at
    • data.virginia.gov
    • +4more
    Updated Jul 19, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Health & Human Services (2016). GenBank [Dataset]. https://data.wu.ac.at/schema/data_gov/ZTY5YzNkYjUtZGRlZC00NzRmLThjY2YtYzI1MjAyYzhmNzI0
    Explore at:
    Dataset updated
    Jul 19, 2016
    Dataset provided by
    U.S. Department of Health & Human Services
    Description

    GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. GenBank is designed to provide and encourage access within the scientific community to the most up to date and comprehensive DNA sequence information.

  2. Recovered mitochondrial genome sizes and their respective GenBank accession...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Silvia Andrade Justi; John Soghigian; David B. Pecor; Laura Caicedo-Quiroga; Wiriya Rutvisuttinunt; Tao Li; Lori Stevens; Patricia L. Dorn; Brian Wiegmann; Yvonne-Marie Linton (2023). Recovered mitochondrial genome sizes and their respective GenBank accession numbers. [Dataset]. http://doi.org/10.1371/journal.pone.0247068.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Silvia Andrade Justi; John Soghigian; David B. Pecor; Laura Caicedo-Quiroga; Wiriya Rutvisuttinunt; Tao Li; Lori Stevens; Patricia L. Dorn; Brian Wiegmann; Yvonne-Marie Linton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recovered mitochondrial genome sizes and their respective GenBank accession numbers.

  3. Gomphid DNA sequence data

    • catalog.data.gov
    • data.amerigeoss.org
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Gomphid DNA sequence data [Dataset]. https://catalog.data.gov/dataset/gomphid-dna-sequence-data
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    DNA sequence data for several genetic loci. This dataset is not publicly accessible because: It's already publicly available on GenBank. It can be accessed through the following means: GenBank/NCBI (http://www.ncbi.nlm.nih.gov/). Accession numbers KX890490-KX891168. Format: This dataset is DNA sequence data. It is available in GenBank. Accession numbers KX890490-KX891168. This dataset is associated with the following publication: Ware, J., E. Pilgrim, M. May, N. Donnelly, and K. Tennessen. Phylogenetic relationships of North American Gomphidae and their close relatives. Systematic Entomology. John Wiley & Sons, Inc., Hoboken, NJ, USA, 42(2): 347-358, (2017).

  4. h

    DNA_coding_regions

    • huggingface.co
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gustavo Henrique Ferreira Cruz (2025). DNA_coding_regions [Dataset]. https://huggingface.co/datasets/GustavoHCruz/DNA_coding_regions
    Explore at:
    Dataset updated
    Nov 12, 2025
    Authors
    Gustavo Henrique Ferreira Cruz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DNA Coding Regions Dataset

    This is a curated collection of genomic sequences extracted directly from NCBI GenBank, designed to support research in introns and exons classification, DNA-to-protein translation, gene structure analysis, and biological sequence modeling with deep learning architectures.

      Source and Extraction Pipeline
    

    All records were extracted from GenBank using Biopython.The dataset construction followed a reproducible data processing pipeline written in… See the full description on the dataset page: https://huggingface.co/datasets/GustavoHCruz/DNA_coding_regions.

  5. Amount of total genomic DNA extracted and amount of raw data (i.e., raw...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Silvia Andrade Justi; John Soghigian; David B. Pecor; Laura Caicedo-Quiroga; Wiriya Rutvisuttinunt; Tao Li; Lori Stevens; Patricia L. Dorn; Brian Wiegmann; Yvonne-Marie Linton (2023). Amount of total genomic DNA extracted and amount of raw data (i.e., raw reads in gigabases– 109 bp) obtained per specimen. [Dataset]. http://doi.org/10.1371/journal.pone.0247068.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Silvia Andrade Justi; John Soghigian; David B. Pecor; Laura Caicedo-Quiroga; Wiriya Rutvisuttinunt; Tao Li; Lori Stevens; Patricia L. Dorn; Brian Wiegmann; Yvonne-Marie Linton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Amount of total genomic DNA extracted and amount of raw data (i.e., raw reads in gigabases– 109 bp) obtained per specimen.

  6. Inventory of soil prokaryotic microbiome (via 16S based on rRNA gene...

    • search.dataone.org
    • portal.edirepository.org
    Updated Feb 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jun Zhao; Willm Martens-Habbena (2024). Inventory of soil prokaryotic microbiome (via 16S based on rRNA gene amplicons) in freshwater and brackish water marshes following saltwater intrusion along Shark River Slough boundary, Everglades National Park (FCE LTER), Florida, USA, September 2018 [Dataset]. https://search.dataone.org/view/https%3A%2F%2Fpasta.lternet.edu%2Fpackage%2Fmetadata%2Feml%2Fknb-lter-fce%2F1264%2F1
    Explore at:
    Dataset updated
    Feb 21, 2024
    Dataset provided by
    Long Term Ecological Research Networkhttp://www.lternet.edu/
    Authors
    Jun Zhao; Willm Martens-Habbena
    Time period covered
    Jan 1, 2018
    Area covered
    Variables measured
    run, bases, bytes, depth, consent, version, latitude, organism, platform, run_link, and 29 more
    Description

    Global sea-level rise is transforming coastal ecosystems, especially freshwater wetlands, in part due to increased episodic or chronic saltwater exposure, leading to shifts in microbial communities and related ecological services. Soil prokaryotes play a fundamental role in regulating important biogeochemical processes in coastal wetland ecosystem. Yet, it is still difficult to predict how soil prokaryotic communities respond to the saltwater exposure because of poorly understood prokaryotic sensitivity within complex wetland soil microbial communities, as well as the high heterogeneity of wetland soils and saltwater exposure. To address this, a four-year experimental simulation of saltwater intrusion in a pristine freshwater site and a previously saltwater-impacted site was conducted. The saltwater addition started in October 2014 on a monthly basis and continued through October 2018. The dataset contains amplicon sequencing date of 16S rRNA gene obtained from saltwater-exposed soils and unmanipulated native soils in both sites (collected in September 2018). The 2018 data are published in Zhao et al. 2023. A detailed list of sequence data and their accession numbers in GenBank is provided, and data collection is complete. This data package is an inventory of sequence read archive (SRA) entries available through GenBank BioProject PRJNA804545 (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA804545). This data package is associated with the following publication: Zhao, J., Chakrabarti, S., Chambers, R., Weisenhorn, P., Travieso, R., Stumpf, S., Standen, E., Briceno, H., Troxler, T., Gaiser, E., Kominoski, J., Dhillon, B., & Martens-Habbena, W. (2023). Year-around survey and manipulation experiments reveal differential sensitivities of soil prokaryotic and fungal communities to saltwater intrusion in Florida Everglades wetlands. Science of The Total Environment, 858, 159865. https://doi.org/10.1016/j.scitotenv.2022.159865 Instead of citing this package, which is an inventory, please cite the original GenBank data or journal article, as appropriate. Citation guidance for the journal article is available on the respective publisher's website.

  7. g

    Whole genome DNA sequences of Gulf of Mexico invertebrates

    • data.griidc.org
    • search.dataone.org
    Updated Aug 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    W. Kelley Thomas (2020). Whole genome DNA sequences of Gulf of Mexico invertebrates [Dataset]. http://doi.org/10.7266/n7-pchj-dh15
    Explore at:
    Dataset updated
    Aug 5, 2020
    Dataset provided by
    GRIIDC
    Authors
    W. Kelley Thomas
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Description

    The dataset consists of whole genome DNA sequences, generated from invertebrate species from the Gulf of Mexico during the Benthic Invertebrate Taxonomy, Metagenomics, and Bioinformatics Workshop (BITMaB) in 2017 in Corpus Christi, Texas, USA. All genomic data sets were deposited in and distributed by GenBank (NCBI), the European Nucleotide Archive (ENA)- European Bioinformatics Institute (EMBL-EBI), DNA Data Bank of Japan, NemATOL, the Global Genome Initiative, and Ocean Genome Legacy.

  8. d

    Data from: Genome datasets for Calonectria henricotiae and C....

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Genome datasets for Calonectria henricotiae and C. pseudonaviculata causing boxwood blight disease and related fungal species [Dataset]. https://catalog.data.gov/dataset/genome-datasets-for-calonectria-henricotiae-and-c-pseudonaviculata-causing-boxwood-blight--7ddf9
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    Boxwood blight disease, caused by the fungi Calonectria henricotiae and C. pseudonaviculata, is an emergent threat to natural and managed landscapes worldwide. Boxwood blight emerged for the first time in the U.K. during the 1990s, then spread rapidly throughout Europe. By 2011, the fungus that causes the disease, Calonectria pseudonaviculata, was found in the U.S., threatening an industry valued at $103 million annually and countless mature landscapes, some dating back to early Colonial times. Since the first U.S. outbreaks, boxwood blight has been identified from a total of 19 states that together comprise 62% of the total U.S. boxwood production. A second pathogen, C. henricotiae, was recently described from five European countries. Infection can be latent, and the pathogen may sequester in less susceptible boxwood cultivars. Because there are no curative treatments—fungicides are at best suppressive of symptoms—infected plants are rendered unfit for sale. If infected plants are not destroyed, they provide a long-lived source of inoculum that spreads the pathogen by spores or resistant survival structures in soil, air, or water. Our goal is to provide knowledge and tools needed to reduce the impact of boxwood blight on the green industry. This database includes genome datasets from Calonectria pathogens of boxwood and related species. Resources in this dataset:Resource Title: Genome assembly, Calonectria pseudonaviculata CBS 139707 (aka cpsCT01). File Name: CT1_ALLPATHS.scaffolds_FINAL.txtResource Description: This assembly is also accessioned on NCBI GenBank under accession number PGGA00000000. The genome is contained in 27 contigs derived from Illumina and PacBio reads, depth of coverage 285x.Resource Title: Draft genome assembly, Calonectria henricotiae NL009. File Name: NL009contigs_FINAL111517.txtResource Description: This is a partial genome assembly for Calonectria henricotiae isolate NL009. The assembly is also accessioned on NCBI GenBank as PGSF00000000. Sequences were generated using Illumina MiSeq, total depth of coverage is 34x.Resource Title: Draft genome assembly, Calonectria henricotiae isolate CB077. File Name: CB77contigs_subset_FINAL111517.txtResource Description: This is a partial genome assembly of Calonectria henricotiae CB077, also accessioned on NCBI GenBank as PGSE00000000. The sequences were generated using an Illumina MiSeq, depth of coverage 30x.Resource Title: Genome assembly, Calonectia leucothoes CBS 109166. File Name: Cleucothes _contigs_032017REV1.txtResource Description: Draft genome assembly from Calonectria leucothoes CBS 109166, also accessioned on NCBI GenBank as NAJI00000000. The sequences were generated using an Illumina MiSeq, average depth of coverage 124x.Resource Title: Genome assembly, Calonectria naviculata CBS 101121. File Name: Cnaviculata_CBS101121 contig list_031017_FINAL2 (1).txtResource Description: Draft genome assembly of Calonectria naviculata CBS 101121, also accessioned on NCBI GenBank as NAGG00000000. The genomes was generated using an Illumina MiSeq, average depth of coverage 84x.Resource Title: Calonectria pseudonaviculata CBS 139707, Gene model predictions (CDs). File Name: CpseudonaviculataPredicted_CT1_CDS.txtResource Description: CDs file of gene models predicted using Coding Quarry 2Resource Title: Calonectria pseudonaviculata CBS 139707, GFF file for gene model predictions. File Name: CpseudonaviculataCT1PredictedPass.gff3.txtResource Description: Gene model predictions and GFF file generated using CodingQuarry2.Resource Title: Draft genome assembly, Calonectria henricotiae CBS 138102 (aka CB045). File Name: CB45Jelly1b_FINAL111717_FINAL.txtResource Description: This datatset contains a draft genome assembly for Calonectria henricotiae CBS 138102; these data are also accessioned on NCBI GenBank as JYJY00000000.Resource Title: Genome assembly, Calonectria pseudonaviculata CBS 139394. File Name: CBS 139394.txtResource Description: This file contains the draft genome assembly for Calonectria pseudonaviculata CBS 139394. The assembly is accessioned with NCBI GenBank, JYJY00000000Resource Title: Draft genome assembly, Calonectria pseudonaviculata CBS 114417. File Name: cbs114417contigs_FINAL_final112017.txtResource Title: Draft genome assembly, Calonectria pseudonaviculata cpsCT13. File Name: cpsCT13_Final_FINAL111717.txtResource Title: Draft genome assembly Calonectria pseudonaviculata ICMP 14368. File Name: ICMP14368FINAL112017_final.txtResource Title: Draft genome assembly, Calonectria pseudonaviculata NC-BB1. File Name: NCBB1contigs_FINAL_final112017.txtResource Title: Draft genome assembly, Calonectria pseudonaviculata ODA1. File Name: ODA1contigs_FINAL112117final.txtResource Title: Draft genome assembly, Calonectria henricotiae NL017. File Name: NL017contigs_FINAL111717.txt

  9. H

    Data from: Dissecting the molecular signatures of apical cell-type shoot...

    • datasetcatalog.nlm.nih.gov
    • data.niaid.nih.gov
    • +1more
    Updated Mar 12, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    McKain, Michael; Scanlon, Michael; Frank, Margaret; Fei, Zhangjun; Schultz, Eric; Edwards, Molly; Sorensen, Iben; Rose, Jocelyn (2015). Dissecting the molecular signatures of apical cell-type shoot meristems from two ancient land plant lineages [Dataset]. http://doi.org/10.7910/DVN/29464
    Explore at:
    Dataset updated
    Mar 12, 2015
    Authors
    McKain, Michael; Scanlon, Michael; Frank, Margaret; Fei, Zhangjun; Schultz, Eric; Edwards, Molly; Sorensen, Iben; Rose, Jocelyn
    Description

    Preparation and assembly of the Equisetum arvense transcriptome. Young stems were collected and frozen in liquid nitrogen from a wild E. arvense population growing in Ithaca, NY. RNA was extracted (Wan & Wilkins, 1994) and 500 ng of total RNA was selected and amplified using the TargetAmp aRNA amplification kit (Epicentre, Madison, WI, U.S.A.). The product was purified using the RNeasy Mini Kit (Qiagen, Germantown, MD, U.S.A.) and cDNA libraries were made using the SuperScript™ Choice System (Life Technologies, Carlsbad, CA, U.S.A.), with a mix of polyT and random hexamer DNA primers for first strand synthesis and only random hexamers for the second strand. cDNAs were purified using the Pure LinkPCR Purification Kit (Life Technologies, Carlsbad, CA, U.S.A.) and libraries were prepared for 454 pyrosequencing using a 454 Genome Sequencer FLX system with titanium chemistry, according to manufacturer’s instructions (Roche Diagnostics, Indianapolis, IN, U.S.A.) and then sequenced at the Cornell University BRC DNA sequencing facility (http://cores.lifesciences.cornell.edu/brcinfo/?f=1). The raw sequence files in SFF format were base called using the Pyrobayes base caller (Quinlan et al., 2008). The sequences were then processed to remove low quali ty regions and adaptor sequences using the programs LUCY (Chou & Holmes, 2001) and SeqClean (https://github.com/gentoo-science/sci/blob/master/sci-biology/seqclean/seqclean-110625.ebuild). The resulting high quality sequences were then screened against the NCBI UniVec database and E. coli genome sequences to remove possible contamination. Sequences shorter than 30 base pairs were discarded. The processed high-quality sequences were assembled de novo using iAssembler (Zheng et al., 2011). After a ssembly, the unigenes were annotated by BLAST searches against GenBank (http://www.ncbi.nlm.nih.gov/genbank) non-redundant protein (nr) with a cut off e value of 1e-5.

  10. Summary statics for the genome assembly of each specimen: Total contigs in...

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Silvia Andrade Justi; John Soghigian; David B. Pecor; Laura Caicedo-Quiroga; Wiriya Rutvisuttinunt; Tao Li; Lori Stevens; Patricia L. Dorn; Brian Wiegmann; Yvonne-Marie Linton (2023). Summary statics for the genome assembly of each specimen: Total contigs in the assembly (N), contiguity of the assembly (N50), sum of contig lengths (SUM). [Dataset]. http://doi.org/10.1371/journal.pone.0247068.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Silvia Andrade Justi; John Soghigian; David B. Pecor; Laura Caicedo-Quiroga; Wiriya Rutvisuttinunt; Tao Li; Lori Stevens; Patricia L. Dorn; Brian Wiegmann; Yvonne-Marie Linton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary statics for the genome assembly of each specimen: Total contigs in the assembly (N), contiguity of the assembly (N50), sum of contig lengths (SUM).

  11. Genome Sequence Data Set02

    • s.cnmilf.com
    • catalog.data.gov
    Updated Mar 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Genome Sequence Data Set02 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/genome-sequence-data-set02
    Explore at:
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The Whole Genome Shotgun project has been deposited in DDBJ/ENA/GenBank under the BioProject PRJNA487286 with the following accession numbers CP061840 (chromosome) and CP061841 (plasmid). The raw sequence reads have been submitted to the NCBI SRA under the accession numbers SRR13076822 and SRR13076823. This dataset is associated with the following publication: Gomez-Alvarez, V., L. Boczek, I. Raffenberg, and R. Revetta. Closed Genome and Plasmid Sequences of Legionella pneumophila AW-13-4, Isolated from a Hot Water Loop System of a Large Occupational Building. Microbiology Resource Announcements. American Society for Microbiology, Washington, DC, USA, 10(1): e01276-20, (2021).

  12. Inventory of soil prokaryotic and fungal microbiome (via 16S rRNA gene...

    • search.dataone.org
    • dataone.org
    • +1more
    Updated Feb 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jun Zhao; Willm Martens-Habbena (2024). Inventory of soil prokaryotic and fungal microbiome (via 16S rRNA gene amplicons and ITS sequencing) from Shark River Slough and Taylor Slough, Everglades National Park (FCE LTER), Florida, USA, February 2019 - October 2020 [Dataset]. https://search.dataone.org/view/https%3A%2F%2Fpasta.lternet.edu%2Fpackage%2Fmetadata%2Feml%2Fknb-lter-fce%2F1265%2F1
    Explore at:
    Dataset updated
    Feb 21, 2024
    Dataset provided by
    Long Term Ecological Research Networkhttp://www.lternet.edu/
    Authors
    Jun Zhao; Willm Martens-Habbena
    Time period covered
    Jan 1, 2019 - Jan 1, 2020
    Area covered
    Variables measured
    run, bases, bytes, depth, consent, version, SITENAME, latitude, organism, platform, and 30 more
    Description

    Global sea-level rise is transforming coastal ecosystems, especially freshwater wetlands, in part due to increased saltwater exposure, leading to change in soil microbial communities and many important biogeochemical processes. Given the high spatial and temporal heterogeneity in coastal wetlands, especially in tropical or subtropical climates characterized by seasonal temperature, precipitation, and tidal fluctuations, it remains unclear which environmental factors influence the compositions of soil microbial communities in wetlands affected by varying degrees of sea-water intrusion. To address this, a two-year survey was conducted on microbial community structure in submerged surface soils from 14 wetland sites across the Florida Everglades, representing three major ecosystem types, i.e. freshwater marshes, mangrove forests, and seagrass meadows. Bulk surface soil samples of each site were collected from February 2019 to October 2020 to cover dry and wet seasons. In addition to bulk soil samples, soil cores were collected from each site in August 2020 to assess vertical gradients of microbial communities. The dataset contains amplicon sequencing data of 16S rRNA gene (both bulk soil and soil cores) and ITS gene (only the bulk soil). The 2019 to 2020 data are published in Zhao et al. 2023. A detailed list of sequence data and their accession numbers in GenBank is provided, and data collection is complete. This data package is an inventory of sequence read archive (SRA) entries available through GenBank BioProject PRJNA804243 (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA804243), PRJNA804246 (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA804246), and PRJNA804228 (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA804228). This data package is associated with the following publication: Zhao, J., Chakrabarti, S., Chambers, R., Weisenhorn, P., Travieso, R., Stumpf, S., Standen, E., Briceno, H., Troxler, T., Gaiser, E., Kominoski, J., Dhillon, B., & Martens-Habbena, W. (2023). Year-around survey and manipulation experiments reveal differential sensitivities of soil prokaryotic and fungal communities to saltwater intrusion in Florida Everglades wetlands. Science of The Total Environment, 858, 159865. https://doi.org/10.1016/j.scitotenv.2022.159865 Instead of citing this package, which is an inventory, please cite the original GenBank data or journal article, as appropriate. Citation guidance for the journal article is available on the respective publisher's website.

  13. f

    Homogentisate 1,2-dioxygenase (HmgA) and laccase (Lac) references taken from...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Feb 20, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vicente, Vania A.; Sun, Jiufeng; van den Ende, Albertus H. G. Gerrits; Najafzadeh, Mohammed J.; Feng, Peiying; Xi, Liyan; De Hoog, Gerrit S. (2013). Homogentisate 1,2-dioxygenase (HmgA) and laccase (Lac) references taken from GenBank. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001734525
    Explore at:
    Dataset updated
    Feb 20, 2013
    Authors
    Vicente, Vania A.; Sun, Jiufeng; van den Ende, Albertus H. G. Gerrits; Najafzadeh, Mohammed J.; Feng, Peiying; Xi, Liyan; De Hoog, Gerrit S.
    Description

    CBS: Centraalbureau voor Schimmelcultures Fungal Biodiversity Centre, Utrecht, Netherlands.NIH: The National Institute of Heath, Bethesda, Maryland, USA.ATCC: American Type Culture Collection, Manassas, VA, USA.FGSC: The Fungal Genetics Stock Center, Kansas City, Missouri, USA.NRRL: ARS Culture Collection, Washington DC, USA.

  14. h

    ncbi

    • huggingface.co
    Updated Mar 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chengguang Gan (2025). ncbi [Dataset]. https://huggingface.co/datasets/ganchengguang/ncbi
    Explore at:
    Dataset updated
    Mar 9, 2025
    Authors
    Chengguang Gan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ganchengguang/ncbi dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. n

    ITS and LSU sequences, phylogenetic trees

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Jul 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rubab Khurshid; Arooj Naseer; Israr Ahmad; Abdual Nasir Khalid (2022). ITS and LSU sequences, phylogenetic trees [Dataset]. http://doi.org/10.5061/dryad.9ghx3ffkt
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 4, 2022
    Dataset provided by
    Women University of Azad Jammu and Kashmir, Bagh
    University of the Punjab
    Authors
    Rubab Khurshid; Arooj Naseer; Israr Ahmad; Abdual Nasir Khalid
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Russula kashmiriana sp. nov (subg. Tenellula, sect. Laricinae) has been collected and described from Himalayan coniferous forest of Azad Jammu and Kashmir (AJ&K), Pakistan. The taxon is characterized by bright reddish orange pileus with cracked margins and obvious reddish orange circle on the pileus, yellow context and light olive to yellowish amyloid basidiospores. Micro and macroscopic features along with molecular phylogenetic analysis based on large subunit (LSU) and internal transcribed spacer (ITS) sequences data confirmed the status of Russula kashmiriana as a distinct species. AJ&K, LSU, Phylogeny, Russulaceae. Methods DNA was extracted from gills of dried fruiting bodies by modified CTAB method and the rDNA ITS region was amplified using universal primer pair ITS1F and ITS 4 (Gardes and Bruns, 1993; Bruns, 1995). The LSU regions were amplified by using LR0R and LR5 primers (Ge et al., 2014). The PCR products was sequenced using the same primers (Macrogen, Korea). To generate consensus sequences of both LSU and ITS regions BioEdit version 7.2.5 (Hall, 1999) was used. Nucleotide sequence comparisons were performed with Basic Local Alignment Search Tool (BLAST) network services using National Center for Biotechnology Information (NCBI), USA database (http://www.ncbi.nlm.nih.gov/). For phylogenetic tree construction and alignment of sequences, closely related sequences were retrieved from GenBank. Published sequences and data from the literature were also added in final data set. An online software MUSCLE was used for alignment purpose (http://www.ebi.ac.uk/Tools/msa/muscle/). Phylogenetic analyses were performed in MEGA6.

  16. b

    Sample collection information and sequence accessions at the National Center...

    • bco-dmo.org
    • search.dataone.org
    csv
    Updated Apr 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John J. Stachowicz (2024). Sample collection information and sequence accessions at the National Center for Biotechnology Information (NCBI) for whole genome sequencing of eelgrass (Zostera marina) collected at Bodega and Tomales Bay, CA, USA from July to September 2019 [Dataset]. http://doi.org/10.26008/1912/bco-dmo.924786.1
    Explore at:
    csv(25.73 KB)Available download formats
    Dataset updated
    Apr 10, 2024
    Dataset provided by
    Biological and Chemical Data Management Office
    Authors
    John J. Stachowicz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 16, 2019 - Sep 30, 2019
    Area covered
    Variables measured
    Site, latitude, organism, accession, longitude, sample_name, collection_date, isolation_source, bioproject_accession
    Measurement technique
    Fluorometer, Agarose Gel Electrophoresis System, Automated DNA Sequencer
    Description

    This dataset includes sample collection information and sequence accessions at the National Center for Biotechnology Information (NCBI) for whole genome sequencing of eelgrass (Zostera marina) collected at Bodega and Tomales Bay, California, USA from July and September of 2019. Sequence Read Archive (SRA) Experiments and BioSamples can be accessed from the NCBI BioProject PRJNA887384 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA887384/).

    Results summary as described in Scheibelhut, et al. (2023): We examine genomic signals of selection in the eelgrass Zostera marina across temperature gradients in adjacent embayments. Although we find many genomic regions with signals of selection within each bay there is very little overlap in signals of selection at the SNP level, despite most polymorphisms being shared across bays. We do find overlap at the gene level, potentially suggesting multiple mutational pathways to the same phenotype. Using polygenic models we find that some sets of candidate SNPs are able to predict temperature across both bays, suggesting that small but parallel shifts in allele frequencies may be missed by independent genome scans. Together, these results highlight the continuous rather than binary nature of parallel evolution in polygenic traits and the complexity of evolutionary predictability.

  17. f

    DNA barcoding Brooklyn (New York): A first assessment of biodiversity in...

    • plos.figshare.com
    xlsx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christine Marizzi; Antonia Florio; Melissa Lee; Mohammed Khalfan; Cornel Ghiban; Bruce Nash; Jenna Dorey; Sean McKenzie; Christine Mazza; Fabiana Cellini; Carlo Baria; Ron Bepat; Lena Cosentino; Alexander Dvorak; Amina Gacevic; Cristina Guzman-Moumtzis; Francesca Heller; Nicholas Alexander Holt; Jeffrey Horenstein; Vincent Joralemon; Manveer Kaur; Tanveer Kaur; Armani Khan; Jessica Kuppan; Scott Laverty; Camila Lock; Marianne Pena; Ilona Petrychyn; Indu Puthenkalam; Daval Ram; Arlene Ramos; Noelle Scoca; Rachel Sin; Izabel Gonzalez; Akansha Thakur; Husan Usmanov; Karen Han; Andy Wu; Tiger Zhu; David Andrew Micklos (2023). DNA barcoding Brooklyn (New York): A first assessment of biodiversity in Marine Park by citizen scientists [Dataset]. http://doi.org/10.1371/journal.pone.0199015
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Christine Marizzi; Antonia Florio; Melissa Lee; Mohammed Khalfan; Cornel Ghiban; Bruce Nash; Jenna Dorey; Sean McKenzie; Christine Mazza; Fabiana Cellini; Carlo Baria; Ron Bepat; Lena Cosentino; Alexander Dvorak; Amina Gacevic; Cristina Guzman-Moumtzis; Francesca Heller; Nicholas Alexander Holt; Jeffrey Horenstein; Vincent Joralemon; Manveer Kaur; Tanveer Kaur; Armani Khan; Jessica Kuppan; Scott Laverty; Camila Lock; Marianne Pena; Ilona Petrychyn; Indu Puthenkalam; Daval Ram; Arlene Ramos; Noelle Scoca; Rachel Sin; Izabel Gonzalez; Akansha Thakur; Husan Usmanov; Karen Han; Andy Wu; Tiger Zhu; David Andrew Micklos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Marine Park, New York, Brooklyn
    Description

    DNA barcoding is both an important research and science education tool. The technique allows for quick and accurate species identification using only minimal amounts of tissue samples taken from any organism at any developmental phase. DNA barcoding has many practical applications including furthering the study of taxonomy and monitoring biodiversity. In addition to these uses, DNA barcoding is a powerful tool to empower, engage, and educate students in the scientific method while conducting productive and creative research. The study presented here provides the first assessment of Marine Park (Brooklyn, New York, USA) biodiversity using DNA barcoding. New York City citizen scientists (high school students and their teachers) were trained to identify species using DNA barcoding during a two–week long institute. By performing NCBI GenBank BLAST searches, students taxonomically identified 187 samples (1 fungus, 70 animals and 116 plants) and also published 12 novel DNA barcodes on GenBank. Students also identified 7 ant species and demonstrated the potential of DNA barcoding for identification of this especially diverse group when coupled with traditional taxonomy using morphology. Here we outline how DNA barcoding allows citizen scientists to make preliminary taxonomic identifications and contribute to modern biodiversity research.

  18. U

    Parasite sample voucher numbers and 28S rRNA alignment file for parasites...

    • data.usgs.gov
    • s.cnmilf.com
    • +1more
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca Cole (2024). Parasite sample voucher numbers and 28S rRNA alignment file for parasites collected from California Giant Salamander (Dicamptodon ensatus) from Webb Creek, in the Bear Creek Redwoods Preserve in Santa Clara County, California, USA [Dataset]. http://doi.org/10.5066/P13DTR5V
    Explore at:
    Dataset updated
    Jul 18, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Rebecca Cole
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Oct 31, 2021 - Nov 4, 2021
    Area covered
    Santa Clara County, United States, California
    Description

    Multi sequence file (.msf) file used to construct a 28S rRNA phylogenetic tree of species of Euryhelmis and other heterophyid trematodes. Sequences were generated from metacercariae samples at U.S. Geological Survey National Wildlife Health Center (NWHC) and compared for base pair similarities with publicly available sequences from GenBank using BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Selection of GenBank sequences to include in the .msf file was based on the number of base pair similarities (94% or higher identity to this study’s sequences), 98–100% coverage AND stage of parasite from which the sequence was generated. An excel file listing all parasites and DNA from NWHC vouchered with the University of New Mexico, Museum of Southwestern Biology, Division of Parasitology along with GenBank Voucher numbers of deposited partial 28S rRNA, mitochondrial CO1 and 18S rRNA gene sequences.

  19. h

    ncbi_disease

    • huggingface.co
    Updated May 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NLM/DIR BioNLP Group (2024). ncbi_disease [Dataset]. https://huggingface.co/datasets/ncbi/ncbi_disease
    Explore at:
    Dataset updated
    May 23, 2024
    Dataset authored and provided by
    NLM/DIR BioNLP Group
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    This paper presents the disease name and concept annotations of the NCBI disease corpus, a collection of 793 PubMed abstracts fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community. Each PubMed abstract was manually annotated by two annotators with disease mentions and their corresponding concepts in Medical Subject Headings (MeSH®) or Online Mendelian Inheritance in Man (OMIM®). Manual curation was performed using PubTator, which allowed the use of pre-annotations as a pre-step to manual annotations. Fourteen annotators were randomly paired and differing annotations were discussed for reaching a consensus in two annotation phases. In this setting, a high inter-annotator agreement was observed. Finally, all results were checked against annotations of the rest of the corpus to assure corpus-wide consistency.

    For more details, see: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3951655/

    The original dataset can be downloaded from: https://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/NCBI_corpus.zip This dataset has been converted to CoNLL format for NER using the following tool: https://github.com/spyysalo/standoff2conll Note: there is a duplicate document (PMID 8528200) in the original data, and the duplicate is recreated in the converted data.

  20. Genbank blastn and BOLD IDS best matches for Mississippi samples.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joyce M. Sakamoto; Jerome Goddard; Jason L. Rasgon (2023). Genbank blastn and BOLD IDS best matches for Mississippi samples. [Dataset]. http://doi.org/10.1371/journal.pone.0101389.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Joyce M. Sakamoto; Jerome Goddard; Jason L. Rasgon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Mississippi
    Description

    **Nearest match to sample from USA = U26605.1, Ixodes dammini strain IL94, Illinois.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. Department of Health & Human Services (2016). GenBank [Dataset]. https://data.wu.ac.at/schema/data_gov/ZTY5YzNkYjUtZGRlZC00NzRmLThjY2YtYzI1MjAyYzhmNzI0

GenBank

Explore at:
Dataset updated
Jul 19, 2016
Dataset provided by
U.S. Department of Health & Human Services
Description

GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. GenBank is designed to provide and encourage access within the scientific community to the most up to date and comprehensive DNA sequence information.

Search
Clear search
Close search
Google apps
Main menu