100+ datasets found
  1. s

    NCBI Genome Survey Sequences Database

    • scicrunch.org
    • neuinfo.org
    • +1more
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). NCBI Genome Survey Sequences Database [Dataset]. http://identifiers.org/RRID:SCR_002146
    Explore at:
    Dataset updated
    Jun 17, 2025
    Description

    Database of unannotated short single-read primarily genomic sequences from GenBank including random survey sequences clone-end sequences and exon-trapped sequences. The GSS division of GenBank is similar to the EST division, with the exception that most of the sequences are genomic in origin, rather than cDNA (mRNA). It should be noted that two classes (exon trapped products and gene trapped products) may be derived via a cDNA intermediate. Care should be taken when analyzing sequences from either of these classes, as a splicing event could have occurred and the sequence represented in the record may be interrupted when compared to genomic sequence. The GSS division contains (but is not limited to) the following types of data: * random single pass read genome survey sequences. * cosmid/BAC/YAC end sequences * exon trapped genomic sequences * Alu PCR sequences * transposon-tagged sequences Although dbGSS sequences are incorporated into the GSS Division of GenBank, annotation in dbGSS is more comprehensive and includes detailed information about the contributors, experimental conditions, and genetic map locations.

  2. n

    High Throughput Genomic Sequences Division

    • neuinfo.org
    • dknet.org
    • +1more
    Updated Oct 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). High Throughput Genomic Sequences Division [Dataset]. http://identifiers.org/RRID:SCR_002150
    Explore at:
    Dataset updated
    Oct 16, 2019
    Description

    Database of high-throughput genome sequences from large-scale genome sequencing centers, including unfinished and finished sequences. It was created to accommodate a growing need to make unfinished genomic sequence data rapidly available to the scientific community in a coordinated effort among the International Nucleotide Sequence databases, DDBJ, EMBL, and GenBank. Sequences are prepared for submission by using NCBI's software tools Sequin or tbl2asn. Each center has an FTP directory into which new or updated sequence files are placed. Sequence data in this division are available for BLAST homology searches against either the htgs database or the month database, which includes all new submissions for the prior month. Unfinished HTG sequences containing contigs greater than 2 kb are assigned an accession number and deposited in the HTG division. A typical HTG record might consist of all the first-pass sequence data generated from a single cosmid, BAC, YAC, or P1 clone, which together make up more than 2 kb and contain one or more gaps. A single accession number is assigned to this collection of sequences, and each record includes a clear indication of the status (phase 1 or 2) plus a prominent warning that the sequence data are unfinished and may contain errors. The accession number does not change as sequence records are updated; only the most recent version of a HTG record remains in GenBank.

  3. d

    ARS Microbial Genomic Sequence Database Server

    • catalog.data.gov
    • datadiscoverystudio.org
    • +1more
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). ARS Microbial Genomic Sequence Database Server [Dataset]. https://catalog.data.gov/dataset/ars-microbial-genomic-sequence-database-server-1b81c
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    This database server is supported in fulfilment of the research mission of the Mycotoxin Prevention and Applied Microbiology Research Unit at the National Center for Agricultural Utilization Research in Peoria, Illinois. The linked website provides access to gene sequence databases for various groups of microorganisms, such as Streptomyces species or Aspergillus species and their relatives, that are the product of ARS research programs. The sequence databases are organized in the BIGSdb (Bacterial Isolate Genomic Sequence Database) software package developed by Keith Jolley and Martin Maiden at Oxford University. Resources in this dataset:Resource Title: ARS Microbial Genomic Sequence Database Server. File Name: Web Page, url: http://199.133.98.43

  4. o

    COVID-19 Genome Sequence Dataset

    • registry.opendata.aws
    • catalog.midasnetwork.us
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (NLM) (2020). COVID-19 Genome Sequence Dataset [Dataset]. https://registry.opendata.aws/ncbi-covid-19/
    Explore at:
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    <a href="http://nlm.nih.gov/">National Library of Medicine (NLM)</a>
    Description

    This repository within the ACTIV TRACE initiative houses a comprehensive collection of datasets related to SARS-CoV-2. The processing of SARS-CoV-2 Sequence Read Archive (SRA) files has been optimized to identify genetic variations in viral samples. This information is then presented in the Variant Call Format (VCF). Each VCF file corresponds to the SRA parent-run's accession ID. Additionally, the data is available in the parquet format, making it easier to search and filter using the Amazon Athena Service. The SARS-CoV-2 Variant Calling Pipeline is designed to handle new data every six hours, with updates to the AWS ODP bucket occurring daily.

  5. f

    The results of whole genome sequence database (the TrueBacTM ID-Genome...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oh Joo Kweon; Yong Kwan Lim; Hye Ryoun Kim; Tae-Hyoung Kim; Sung-min Ha; Mi-Kyung Lee (2023). The results of whole genome sequence database (the TrueBacTM ID-Genome system) matching for the novel Cupriavidus species. [Dataset]. http://doi.org/10.1371/journal.pone.0232850.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Oh Joo Kweon; Yong Kwan Lim; Hye Ryoun Kim; Tae-Hyoung Kim; Sung-min Ha; Mi-Kyung Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The results of whole genome sequence database (the TrueBacTM ID-Genome system) matching for the novel Cupriavidus species.

  6. u

    Data from: SoyBase and the Soybean Breeder's Toolbox

    • agdatacommons.nal.usda.gov
    • gimi9.com
    • +3more
    bin
    Updated Feb 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David M. Grant (2024). SoyBase and the Soybean Breeder's Toolbox [Dataset]. http://doi.org/10.15482/USDA.ADC/1212265
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 8, 2024
    Dataset provided by
    Ag Data Commons
    Authors
    David M. Grant
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    SoyBase is a repository for genetics, genomics and related data resources for soybean. It contains current genetic, physical and genomic sequence maps integrated with qualitative and quantitative traits. SoyBase database was established in the 1990s as the USDA Soybean Genetics Database. Originally, it contained only genetic information about soybeans such as genetic maps and information about the Mendelian genetics of soybean. In time SoyBase was expanded to include molecular data regarding soybean genes and sequences as they became available. In 2010, the soybean genome sequence was published and it and supporting gene sequences have been integrated into the SoyBase sequence browser. SoyBase genetic maps were used in the assembly of both the Williams 82 2010 assembly (Wm82.a1.v1) and the newest genome assembly (Wm82.a2.v1). SoyBase also incorporates information about mutant and other soybean genetic stocks and serves as a contact point for ordering strains from those populations. As association analyses continue due to various re-sequencing efforts SoyBase will also incorporate those data into the soybean genome browser as they become available. Gene expression patterns are also available at SoyBase through the SoyBase expression pages and the Soybean Gene Atlas. Other expression/transcriptome/methylomic data sets also have been and continue to be incorporated into the SoyBase genome browser. Project No:3625-21000-062-00D Accession No: 0425040 Resources in this dataset:Resource Title: SoyBase, the USDA-ARS soybean genetics and genomics database web site. File Name: Web Page, url: https://soybase.org SoyBase database was established in the 1990s as the USDA Soybean Genetics Database. Originally, it contained only genetic information about soybeans such as genetic maps and information about the Mendelian genetics of soybean. In time SoyBase was expanded to include molecular data regarding soybean genes and sequences as they became available. In 2010, the soybean genome sequence was published and it and supporting gene sequences have been integrated into the SoyBase sequence browser. SoyBase genetic maps were used in the assembly of both the Williams 82 2010 assembly (Wm82.a1.v1) and the newest genome assembly (Wm82.a2.v1).

    Soybean Pods and Seeds SoyBase also incorporates information about mutant and other soybean genetic stocks and serves as a contact point for ordering strains from those populations. As association analyses continue due to various re-sequencing efforts SoyBase will also incorporate those data into the soybean genome browser as they become available. Gene expression patterns are also available at SoyBase through the SoyBase expression pages and the Soybean Gene Atlas. Other expression/transcriptome/methylomic data sets also have been and continue to be incorporated into the SoyBase genome browser.

  7. Data from: Cacao Genome Database

    • s.cnmilf.com
    • datasets.ai
    • +1more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Cacao Genome Database [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/cacao-genome-database-0d068
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    Not only is cacao the basic ingredient in the world’s favorite confection, chocolate, but it provides a livelihood for over 6.5 million farmers in Africa, South America and Asia and ranks as one of the top ten agriculture commodities in the world. Historically, cocoa production has been plagued by serious losses due to pests and diseases. The release of the cacao genome sequence will provide researchers with access to the latest genomic tools, enabling more efficient research and accelerating the breeding process, thereby expediting the release of superior cacao cultivars. The sequenced genotype, Matina 1-6, is representative of the genetic background most commonly found in the cacao producing countries, enabling results to be applied immediately and broadly to current commercial cultivars. Matina 1-6 is highly homozygous which greatly reduces the complexity of the sequence assembly process. While the sequence provided is a preliminary release, it already covers 92% of the genome, with approximately 35,000 genes. We will continue to refine the assembly and annotation, working toward a complete finished sequence. Updates will be made available via the main project website. Resources in this dataset:Resource Title: Cacao Genome Database. File Name: Web Page, url: http://www.cacaogenomedb.org/

  8. r

    Sequencing of Idd regions in the NOD mouse genome

    • rrid.site
    • dknet.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Sequencing of Idd regions in the NOD mouse genome [Dataset]. http://identifiers.org/RRID:SCR_001483
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Genetic variations associated with type 1 diabetes identified by sequencing regions of the non-obese diabetic (NOD) mouse genome and comparing them with the same areas of a diabetes-resistant C57BL/6J reference mouse allowing identification of single nucleotide polymorphisms (SNPs) or other genomic variations putatively associated with diabetes in mice. Finished clones from the targeted insulin-dependent diabetes (Idd) candidate regions are displayed in the NOD clone sequence section of the website, where they can be downloaded either as individual clone sequences or larger contigs that make up the accession golden path (AGP). All sequences are publicly available via the International Nucleotide Sequence Database Collaboration. Two NOD mouse BAC libraries were constructed and the BAC ends sequenced. Clones from the DIL NOD BAC library constructed by RIKEN Genomic Sciences Centre (Japan) in conjunction with the Diabetes and Inflammation Laboratory (DIL) (University of Cambridge) from the NOD/MrkTac mouse strain are designated DIL. Clones from the CHORI-29 NOD BAC library constructed by Pieter de Jong (Children's Hospital, Oakland, California, USA) from the NOD/ShiLtJ mouse strain are designated CHORI-29. All NOD mouse BAC end-sequences have been submitted to the International Nucleotide Sequence Database Consortium (INSDC), deposited in the NCBI trace archive. They have generated a clone map from these two libraries by mapping the BAC end-sequences to the latest assembly of the C57BL/6J mouse reference genome sequence. These BAC end-sequence alignments can then be visualized in the Ensembl mouse genome browser where the alignments of both NOD BAC libraries can be accessed through the Distributed Annotation System (DAS). The Mouse Genomes Project has used the Illumina platform to sequence the entire NOD/ShiLtJ genome and this should help to position unaligned BAC end-sequences to novel non-reference regions of the NOD genome. Further information about the BAC end-sequences, such as their alignment, variation data and Ensembl gene coverage, can be obtained from the NOD mouse ftp site.

  9. f

    “kingdom_name” peptides in the MLI samples.

    • plos.figshare.com
    xlsx
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthys G. Potgieter; Andrew J. M. Nel; Suereta Fortuin; Shaun Garnett; Jerome M. Wendoh; David L. Tabb; Nicola J. Mulder; Jonathan M. Blackburn (2023). “kingdom_name” peptides in the MLI samples. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011163.s008
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 29, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Matthys G. Potgieter; Andrew J. M. Nel; Suereta Fortuin; Shaun Garnett; Jerome M. Wendoh; David L. Tabb; Nicola J. Mulder; Jonathan M. Blackburn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundMicrobiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expression datasets make it hard to accurately infer microbial protein synthesis from metagenomic data. Additionally, mass spectrometry-based metaproteomic analyses typically rely on focused search sequence databases based on prior knowledge for protein identification that may not represent all the proteins present in a set of samples. Metagenomic 16S rRNA sequencing only targets the bacterial component, while whole genome sequencing is at best an indirect measure of expressed proteomes. Here we describe a novel approach, MetaNovo, that combines existing open-source software tools to perform scalable de novo sequence tag matching with a novel algorithm for probabilistic optimization of the entire UniProt knowledgebase to create tailored sequence databases for target-decoy searches directly at the proteome level, enabling metaproteomic analyses without prior expectation of sample composition or metagenomic data generation and compatible with standard downstream analysis pipelines.ResultsWe compared MetaNovo to published results from the MetaPro-IQ pipeline on 8 human mucosal-luminal interface samples, with comparable numbers of peptide and protein identifications, many shared peptide sequences and a similar bacterial taxonomic distribution compared to that found using a matched metagenome sequence database—but simultaneously identified many more non-bacterial peptides than the previous approaches. MetaNovo was also benchmarked on samples of known microbial composition against matched metagenomic and whole genomic sequence database workflows, yielding many more MS/MS identifications for the expected taxa, with improved taxonomic representation, while also highlighting previously described genome sequencing quality concerns for one of the organisms, and identifying an experimental sample contaminant without prior expectation.ConclusionsBy estimating taxonomic and peptide level information directly on microbiome samples from tandem mass spectrometry data, MetaNovo enables the simultaneous identification of peptides from all domains of life in metaproteome samples, bypassing the need for curated sequence databases to search. We show that the MetaNovo approach to mass spectrometry metaproteomics is more accurate than current gold standard approaches of tailored or matched genomic sequence database searches, can identify sample contaminants without prior expectation and yields insights into previously unidentified metaproteomic signals, building on the potential for complex mass spectrometry metaproteomic data to speak for itself.

  10. r

    SpBase - Strongylocentrotus purpuratus: the Sea Urchin Genome Database

    • rrid.site
    • scicrunch.org
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). SpBase - Strongylocentrotus purpuratus: the Sea Urchin Genome Database [Dataset]. http://identifiers.org/RRID:SCR_007441
    Explore at:
    Dataset updated
    Jun 3, 2025
    Description

    SpBase is designed to present the results of the genome sequencing project for the purple sea urchin. The sequences and annotations emerging from this effort are organized in a database that provides the research community access to those data not normally presented through National Center for Biotechnology Information and other large databases. Additionally, the unique information on that links gene identities and sequences to the plate and well location to the library filters from the Sea Urchin genome Resource will also be presented. The software used to organize and present the sea urchin genome comes from GMOD, a collection of open source software tools for creating and managing genome-scale biological databases. That sea urchins eggs and embryos have long remained a popular research subject for cell and developmental biologists is one rationale for sequencing the genome. In addition, studies of embryonic development in the California Purple Sea Urchin, Strongylocentrotus purpuratus , have paralleled the emergence of molecular techniques ranging from the characterization of genomic repeat sequences in the 1970''s to the elucidation of gene regulatory networks in recent times. The parent of this site, SUGP, was meant to provide a focal point for the exchange of genomic information as the genome of the Purple sea urchin was being sequenced. Over these past years it has served as a repository for small sequencing projects and a source of sequence information useful for gene discovery projects. Here one could find information on macro-array libraries of cDNAs from the purple sea urchin and genomic DNA from several species. In addition, a Sequence Tag Connector (STC) collection has been assembled from 5% of the genome sequence and a very extensive repeat sequence catalog prepared. All of the sequence data that we maintained at SUGP was incorporated into the new SPBase. Of course, it is all in public sequence databases such as the National Center for Biological Information as well. Some additional sequence information is available at the Resource Center of the German Human Genome Project. With the publication of The Genome of the Sea Urchin Strongylocentrotus purpuratus by The Sea Urchin Genome Sequencing Consortium a link to the first 9941 gene annotations are now publicly available. The effort to sequence the whole purple sea urchin genome was a cooperative one that included contributions from the Sea Urchin Genome Facility here at the Center for Computational Regulatory Genomics, Beckman Institute, Caltech, and support from the Human Genome Research Institute of the National Institutes of Health. The sequencing was done at the Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas. Funding was approved based on an initiative submitted by the Sea Urchin Genome Advisory Committee.

  11. n

    Mouse Genome Database

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Mouse Genome Database [Dataset]. http://identifiers.org/RRID:SCR_012953
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Community model organism database for laboratory mouse and authoritative source for phenotype and functional annotations of mouse genes. MGD includes complete catalog of mouse genes and genome features with integrated access to genetic, genomic and phenotypic information, all serving to further the use of the mouse as a model system for studying human biology and disease. MGD is a major component of the Mouse Genome Informatics.Contains standardized descriptions of mouse phenotypes, associations between mouse models and human genetic diseases, extensive integration of DNA and protein sequence data, normalized representation of genome and genome variant information. Data are obtained and integrated via manual curation of the biomedical literature, direct contributions from individual investigators and downloads from major informatics resource centers. MGD collaborates with the bioinformatics community on the development and use of biomedical ontologies such as the Gene Ontology (GO) and the Mammalian Phenotype (MP) Ontology.

  12. d

    Genome Reviews

    • dknet.org
    Updated Oct 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Genome Reviews [Dataset]. http://identifiers.org/RRID:SCR_007685
    Explore at:
    Dataset updated
    Oct 16, 2019
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented April 24, 2017. The Genome Reviews database provides an up-to-date, standardized and comprehensively annotated view of the genomic sequence of organisms with completely deciphered genomes. Currently, Genome Reviews contains the genomes of archaea, bacteria, bacteriophages and selected eukaryota. Genome Reviews is available as a MySQL relational database, or a flat file format derived from that in the EMBL Nucleotide Sequence Database. An Ensembl-style browser is now available for Genome Reviews, providing a zoomable graphical view of all chromosomes and plasmids represented in the database. The location and structure of all genes is shown and the distribution of features throughout the sequence is displayed.

  13. o

    Darwin: an amino acid sequence collection of complete proteomes from...

    • explore.openaire.eu
    • zenodo.org
    Updated Mar 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joe Win; Sophien Kamoun (2020). Darwin: an amino acid sequence collection of complete proteomes from eukaryotes with different phylogenetic affinities (v. 03_2020_137) [Dataset]. http://doi.org/10.5281/zenodo.3699564
    Explore at:
    Dataset updated
    Mar 6, 2020
    Authors
    Joe Win; Sophien Kamoun
    Description

    Background Every time we find an interesting gene in an organism of interest, the first question is often “how widely is this gene distributed in the eukaryotic kingdom?”. Naturally, one could use NCBI BLAST search against the non-redundant sequence database provided by GenBank to answer this question. However, it can be cumbersome to parse the results and assign them to taxonomic units. It is also not straightforward to get an overview of which eukaryotic groups are represented in the results. Top BLAST hits can be crowded with sequences from closely-related organisms making it difficult gain an overview of the overall distribution across eukaryotes. To streamline this process, we developed an in-house database of complete eukaryotic proteomes. We tagged each sequence with a eukaryotic group handle (two-character symbol) and combined them into a single data set searchable by standalone BLAST on one’s own computer. We named this data set “Darwin” to reflect the diverse nature of the sequences it contains. Methods We downloaded predicted proteomes in FASTA format from different sources such as GenBank, Joint Genome Institute (Depart of Energy, USA), Broad Institute (Massachusetts Institute of Technology, USA), Phytozome and a number of other specialized websites catering for a specific organism such as the Arabidopsis Information Resource (TAIR), or the Saccharomyces Genome Database (SGD). All the organisms we included in Darwin are listed in Table 1. To reduce redundancy, we took care not to include the same species more than once unless subspecies were known to show wide diversity. Each sequence header was tagged with a eukaryotic group handle composed of two-character symbols (based on Keeling et al., 2005). These handles clearly appear in BLAST output and can be parsed easily. We combined sequences from all proteomes into a single data set and named it “Darwin”. Results The current version of Darwin (v. 03_2020_137) contains 2,601,132 amino acid sequences from 137 eukaryotes (Table 1, Data file 1). The sizes of the proteomes were diverse, ranging from ~4000 sequences in some alveolates to 60,000-76,000 in plants. Darwin represents most of the supergroups of eukaryotic kingdom described in Keeling et al., (2005) except those in Rhizaria whose genomes were not available at the time of data set construction. The data set contains larger numbers of proteomes from fungi and plants reflecting areas of interest in our group. Conclusions Darwin is provided as a text fasta file that can be formatted for BLAST searches on standalone computers. The results from the BLAST searches can be parsed to determine how widely a gene of interest is distributed among different eukaryotes. Simple counting of the eukaryotic group handles would also yield an overview of the distribution across taxa. Darwin is also useful for rapidly finding out whether a gene is missing in particular taxa. Reference Keeling PJ, Burger G, Durnford DG, Lang BF, Lee RW, Pearlman RE, Roger AJ, Gray MW (2005) The tree of eukaryotes. Trends Ecol. Evol. 20: 670-676

  14. PeanutBase

    • agdatacommons.nal.usda.gov
    • datasets.ai
    • +2more
    bin
    Updated Feb 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDA Agricultural Research Service (2024). PeanutBase [Dataset]. http://doi.org/10.15482/USDA.ADC/1352915
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 8, 2024
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Authors
    USDA Agricultural Research Service
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    PeanutBase (peanutbase.org) is the primary genetics and genomics database for cultivated peanut and its wild relatives. It houses information about genome sequences, genes and predicted functions, genetic maps, markers, links to germplasm resources, and maps of peanut germplasm origins. This resource is being developed for U.S. and International peanut researchers and breeders, with support from The Peanut Foundation and the many contributors that have made the Peanut Genomics Initiative possible. Funded by The Peanut Foundation as part of the Peanut Genomics Initiative. Additional support from USDA-ARS. Database developed and hosted by the USDA-ARS SoyBase and Legume Clade Database group at Ames, IA, with NCGR and other participants. Resources in this dataset:Resource Title: PeanutBase.org. File Name: Web Page, url: https://peanutbase.org Website pointer for PeanutBase.org - Genetic and genomic data to enable more rapid crop improvement in peanuts. The peanut genome has been sequenced and analyzed as part of the International Peanut Genomic Initiative, in order to accelerate breeding progress and get more productive, disease-resistant, stress-tolerant varieties to farmers. The two diploid progenitors have been sequenced and are available, along with predicted genes and descriptions. The genomes of the diploid progenitors will be used to help identify and assemble the similar chromosomes in cultivated peanut. Cultivated peanut, Arachis hypogaea, is an allotetraploid (2n=4x=40) that contains two complete genomes, labeled the A and B genomes. A. duranensis (2n=2x=20) has likely contributed the A genome, and A. ipaensis has likely contributed the B genome. It may be helpful to remember these two associations by using the mnemonic: "A" comes before "B" and "duranensis" comes before "ipaensis". Because of the difficulty of assembly a tetraploid genome, the two diploids, A. duranensis and A. ipaensis, have been sequenced and assembled first. Together these provide a good initial basis for the tetraploid genome. Additionally, the two will help guide assembly of the tetraploid genome. Sequencing work on the tetraploid genome is underway; stay tuned for updates in 2015.

  15. Data from: MaizeGDB

    • agdatacommons.nal.usda.gov
    • datadiscoverystudio.org
    • +2more
    bin
    Updated Feb 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDA Agricultural Research Service (2024). MaizeGDB [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/MaizeGDB/24660768
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Authors
    USDA Agricultural Research Service
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    MaizeGDB is a community-oriented, long-term, federally funded informatics service to researchers focused on the crop plant and model organism Zea mays. Genomic, genetic, sequence, germplasm, gene product, metabolic pathways, functional characterization, literature reference, diversity, and expression are among the datatypes stored at MaizeGDB. At the project's website are custom interfaces enabling researchers to browse data and to seek out specific information matching explicit search criteria. First released in 1991 with the name MaizeDB, the Maize Genetics and Genomics Database, now MaizeGDB (since 2003), is funded, developed, and hosted by the USDA-ARS located at Ames, Iowa. Resources in this dataset:Resource Title: MaizeGDB, the community database for maize genetics and genomics.. File Name: Web Page, url: https://maizegdb.org/ MaizeGDB is a community-oriented, long-term, federally funded informatics service to researchers focused on the crop plant and model organism Zea mays. Established as a USDA-ARS resource in 2003, MaizeGDB supplies data and resources related to maize. The types of data include genomic, genetic, sequence, germplasm, gene product, metabolic pathways, functional characterization, literature reference, diversity, and expression.

  16. d

    GenBank

    • catalog.data.gov
    • data.virginia.gov
    • +3more
    Updated Jul 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (NIH) (2023). GenBank [Dataset]. https://catalog.data.gov/dataset/genbank
    Explore at:
    Dataset updated
    Jul 26, 2023
    Dataset provided by
    National Institutes of Health (NIH)
    Description

    GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. GenBank is designed to provide and encourage access within the scientific community to the most up to date and comprehensive DNA sequence information.

  17. d

    GenBank

    • dknet.org
    Updated Nov 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). GenBank [Dataset]. http://identifiers.org/RRID:SCR_002760
    Explore at:
    Dataset updated
    Nov 10, 2024
    Description

    NIH genetic sequence database that provides annotated collection of all publicly available DNA sequences for almost 280 000 formally described species (Jan 2014) .These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. It is part of International Nucleotide Sequence Database Collaboration and daily data exchange with European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through NCBI Entrez retrieval system, which integrates data from major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of GenBank database are available by FTP.

  18. f

    9MM Gallus gallus protein BLAST.

    • plos.figshare.com
    txt
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthys G. Potgieter; Andrew J. M. Nel; Suereta Fortuin; Shaun Garnett; Jerome M. Wendoh; David L. Tabb; Nicola J. Mulder; Jonathan M. Blackburn (2023). 9MM Gallus gallus protein BLAST. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011163.s018
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 29, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Matthys G. Potgieter; Andrew J. M. Nel; Suereta Fortuin; Shaun Garnett; Jerome M. Wendoh; David L. Tabb; Nicola J. Mulder; Jonathan M. Blackburn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundMicrobiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expression datasets make it hard to accurately infer microbial protein synthesis from metagenomic data. Additionally, mass spectrometry-based metaproteomic analyses typically rely on focused search sequence databases based on prior knowledge for protein identification that may not represent all the proteins present in a set of samples. Metagenomic 16S rRNA sequencing only targets the bacterial component, while whole genome sequencing is at best an indirect measure of expressed proteomes. Here we describe a novel approach, MetaNovo, that combines existing open-source software tools to perform scalable de novo sequence tag matching with a novel algorithm for probabilistic optimization of the entire UniProt knowledgebase to create tailored sequence databases for target-decoy searches directly at the proteome level, enabling metaproteomic analyses without prior expectation of sample composition or metagenomic data generation and compatible with standard downstream analysis pipelines.ResultsWe compared MetaNovo to published results from the MetaPro-IQ pipeline on 8 human mucosal-luminal interface samples, with comparable numbers of peptide and protein identifications, many shared peptide sequences and a similar bacterial taxonomic distribution compared to that found using a matched metagenome sequence database—but simultaneously identified many more non-bacterial peptides than the previous approaches. MetaNovo was also benchmarked on samples of known microbial composition against matched metagenomic and whole genomic sequence database workflows, yielding many more MS/MS identifications for the expected taxa, with improved taxonomic representation, while also highlighting previously described genome sequencing quality concerns for one of the organisms, and identifying an experimental sample contaminant without prior expectation.ConclusionsBy estimating taxonomic and peptide level information directly on microbiome samples from tandem mass spectrometry data, MetaNovo enables the simultaneous identification of peptides from all domains of life in metaproteome samples, bypassing the need for curated sequence databases to search. We show that the MetaNovo approach to mass spectrometry metaproteomics is more accurate than current gold standard approaches of tailored or matched genomic sequence database searches, can identify sample contaminants without prior expectation and yields insights into previously unidentified metaproteomic signals, building on the potential for complex mass spectrometry metaproteomic data to speak for itself.

  19. s

    Human Gene and Protein Database (HGPD)

    • scicrunch.org
    • neuinfo.org
    Updated Nov 23, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2008). Human Gene and Protein Database (HGPD) [Dataset]. http://identifiers.org/RRID:SCR_002889
    Explore at:
    Dataset updated
    Nov 23, 2008
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE. Documented on January 4,2023.The Human Gene and Protein Database presents SDS-PAGE patterns and other informations of human genes and proteins. The HGPD was constructed from full-length cDNAs. For conversion to Gateway entry clones, we first determined an open reading frame (ORF) region in each cDNA meeting the criteria. Those ORF regions were PCR-amplified utilizing selected resource cDNAs as templates. All the details of the construction and utilization of entry clones will be published elsewhere. Amino acid and nucleotide sequences of an ORF for each cDNA and sequence differences of Gateway entry clones from source cDNAs are presented in the GW: Gateway Summary window. Utilizing those clones with a very efficient cell-free protein synthesis system featuring wheat germ, we have produced a large number of human proteins in vitro. Expressed proteins were detected in almost all cases. Proteins in both total and supernatant fractions are shown in the PE: Protein Expression window. In addition, we have also successfully expressed proteins in HeLa cells and determined subcellular localizations of human proteins. These biological data are presented on the frame of cDNA clusters in the Human Gene and Protein Database. To build the basic frame of HGPD, sequences of FLJ full-length cDNAs and others deposited in public databases (Human ESTs, RefSeq, Ensembl, MGC, etc.) are assembled onto the genome sequences (NCBI Build 35 (UCSC hg17)). The majority of analysis data for cDNA sequences in HGPD are shared with the FLJ Human cDNA Database (http://flj.hinv.jp/) constructed as a human cDNA sequence analysis database focusing on mRNA varieties caused by variations in transcription start site (TSS) and splicing.

  20. d

    Data from: Creating, curating, and evaluating a mitogenomic reference...

    • search.dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Dziedzic; Brian Sidlauskas; Richard Cronn; James Anthony; Trevan Cornwell; Thomas Friesen; Peter Konstantinidis; Brooke Penaluna; Staci Stein; Taal Levi (2023). Creating, curating, and evaluating a mitogenomic reference database to improve regional species identification using environmental DNA [Dataset]. http://doi.org/10.5061/dryad.2jm63xsv4
    Explore at:
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Emily Dziedzic; Brian Sidlauskas; Richard Cronn; James Anthony; Trevan Cornwell; Thomas Friesen; Peter Konstantinidis; Brooke Penaluna; Staci Stein; Taal Levi
    Time period covered
    Jan 1, 2023
    Description

    Species detection using eDNA is revolutionizing global capacity to monitor biodiversity. However, the lack of regional, vouchered, genomic sequence information—especially sequence information that includes intraspecific variation—creates a bottleneck for management agencies wanting to harness the complete power of eDNA to monitor taxa and implement eDNA analyses. eDNA studies depend upon regional databases of mitogenomic sequence information to evaluate the effectiveness of such data to detect and identify taxa. We created the Oregon Biodiversity Genome Project to create a database of complete, nearly error-free mitogenomic sequences for all of Oregon's fishes. We have successfully assembled the complete mitogenomes of 313 specimens of freshwater, anadromous, and estuarine fishes representing 24 families, 55 genera, and 129 species and lineages. Comparative analyses of these sequences illustrate that many regions of the mitogenome are taxonomically informative, that the short (~150 bp) ..., Voucher Specimen and Tissue Collection The study area initially encompassed the state of Oregon—the region of interest for our eDNA monitoring program—and expanded to a few sites in northern California and Washington State (Fig 3). To strategize sample collection, we examined historical location records in fish collections such as the Oregon State Ichthyology Collection and conferred with local biologists to identify current distributions. For cases where we knew or suspected that deeply divergent evolutionary lineages existed in the present concept of a species, we aimed to include representatives of all lineages. We ultimately identified 146 native and nonnative freshwater fish species and lineages that are currently found in Oregon and strategized collections to span watersheds throughout the state (Appendix S1). To facilitate consistent sampling, we provided sampling kits (Appendix S2, Box S1) to collectors that contained a 500-mL Nalgene bottle filled with 10% formalin, a 2.0 mL c..., Microsoft Excel, LibreOffice, or Microsoft's free XLS Viewer can be used to open the Excel files and an unzip utility such as 7-Zip or WinZip can be used to unzip zipped fastas. For pdfs, use Adobe Acrobat Reader. Open Microsoft Word documents using Microsoft Word, OpenOffice Writer or Google Docs.,

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2025). NCBI Genome Survey Sequences Database [Dataset]. http://identifiers.org/RRID:SCR_002146

NCBI Genome Survey Sequences Database

RRID:SCR_002146, SCR_015063, nif-0000-20938, NCBI Genome Survey Sequences Database (RRID:SCR_002146), GSS, Entrez GSS, NCBI dbGSS, dbGSS

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 17, 2025
Description

Database of unannotated short single-read primarily genomic sequences from GenBank including random survey sequences clone-end sequences and exon-trapped sequences. The GSS division of GenBank is similar to the EST division, with the exception that most of the sequences are genomic in origin, rather than cDNA (mRNA). It should be noted that two classes (exon trapped products and gene trapped products) may be derived via a cDNA intermediate. Care should be taken when analyzing sequences from either of these classes, as a splicing event could have occurred and the sequence represented in the record may be interrupted when compared to genomic sequence. The GSS division contains (but is not limited to) the following types of data: * random single pass read genome survey sequences. * cosmid/BAC/YAC end sequences * exon trapped genomic sequences * Alu PCR sequences * transposon-tagged sequences Although dbGSS sequences are incorporated into the GSS Division of GenBank, annotation in dbGSS is more comprehensive and includes detailed information about the contributors, experimental conditions, and genetic map locations.

Search
Clear search
Close search
Google apps
Main menu