100+ datasets found
  1. d

    NCBI Genome Survey Sequences Database

    • dknet.org
    • rrid.site
    • +2more
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). NCBI Genome Survey Sequences Database [Dataset]. http://identifiers.org/RRID:SCR_002146
    Explore at:
    Dataset updated
    Aug 15, 2024
    Description

    Database of unannotated short single-read primarily genomic sequences from GenBank including random survey sequences clone-end sequences and exon-trapped sequences. The GSS division of GenBank is similar to the EST division, with the exception that most of the sequences are genomic in origin, rather than cDNA (mRNA). It should be noted that two classes (exon trapped products and gene trapped products) may be derived via a cDNA intermediate. Care should be taken when analyzing sequences from either of these classes, as a splicing event could have occurred and the sequence represented in the record may be interrupted when compared to genomic sequence. The GSS division contains (but is not limited to) the following types of data: * random single pass read genome survey sequences. * cosmid/BAC/YAC end sequences * exon trapped genomic sequences * Alu PCR sequences * transposon-tagged sequences Although dbGSS sequences are incorporated into the GSS Division of GenBank, annotation in dbGSS is more comprehensive and includes detailed information about the contributors, experimental conditions, and genetic map locations.

  2. Genome Sequence Data Set01

    • catalog.data.gov
    • data.amerigeoss.org
    Updated Nov 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Genome Sequence Data Set01 [Dataset]. https://catalog.data.gov/dataset/genome-sequence-data-set01-d2862
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The fasta files (Genome_Set01.zip) contain the reference-assisted de novo assemblies (as contigs) of three Escherichia coli isolates. The table contains rows as isolates (yellow) and columns as attributes (green) for each individual genome. This dataset is associated with the following publication: Gomez-Alvarez, V., and J. Hoelle-Schwalbach. Draft Genome Sequences of Antibiotic-Resistant Escherichia coli Isolates from U.S. Wastewater Treatment Plants. Microbiology Resource Announcements. American Society for Microbiology, Washington, DC, USA, 8(23): e00351-19, (2019).

  3. Genome Sequence Data Set02

    • catalog.data.gov
    • s.cnmilf.com
    Updated Mar 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Genome Sequence Data Set02 [Dataset]. https://catalog.data.gov/dataset/genome-sequence-data-set02
    Explore at:
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The Whole Genome Shotgun project has been deposited in DDBJ/ENA/GenBank under the BioProject PRJNA487286 with the following accession numbers CP061840 (chromosome) and CP061841 (plasmid). The raw sequence reads have been submitted to the NCBI SRA under the accession numbers SRR13076822 and SRR13076823. This dataset is associated with the following publication: Gomez-Alvarez, V., L. Boczek, I. Raffenberg, and R. Revetta. Closed Genome and Plasmid Sequences of Legionella pneumophila AW-13-4, Isolated from a Hot Water Loop System of a Large Occupational Building. Microbiology Resource Announcements. American Society for Microbiology, Washington, DC, USA, 10(1): e01276-20, (2021).

  4. n

    Genome Reviews

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Oct 31, 2005
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2005). Genome Reviews [Dataset]. http://identifiers.org/RRID:SCR_007685
    Explore at:
    Dataset updated
    Oct 31, 2005
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented April 24, 2017. The Genome Reviews database provides an up-to-date, standardized and comprehensively annotated view of the genomic sequence of organisms with completely deciphered genomes. Currently, Genome Reviews contains the genomes of archaea, bacteria, bacteriophages and selected eukaryota. Genome Reviews is available as a MySQL relational database, or a flat file format derived from that in the EMBL Nucleotide Sequence Database. An Ensembl-style browser is now available for Genome Reviews, providing a zoomable graphical view of all chromosomes and plasmids represented in the database. The location and structure of all genes is shown and the distribution of features throughout the sequence is displayed.

  5. u

    Data from: SoyBase and the Soybean Breeder's Toolbox

    • agdatacommons.nal.usda.gov
    • s.cnmilf.com
    • +3more
    bin
    Updated Feb 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David M. Grant (2024). SoyBase and the Soybean Breeder's Toolbox [Dataset]. http://doi.org/10.15482/USDA.ADC/1212265
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 8, 2024
    Dataset provided by
    Ag Data Commons
    Authors
    David M. Grant
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    SoyBase is a repository for genetics, genomics and related data resources for soybean. It contains current genetic, physical and genomic sequence maps integrated with qualitative and quantitative traits. SoyBase database was established in the 1990s as the USDA Soybean Genetics Database. Originally, it contained only genetic information about soybeans such as genetic maps and information about the Mendelian genetics of soybean. In time SoyBase was expanded to include molecular data regarding soybean genes and sequences as they became available. In 2010, the soybean genome sequence was published and it and supporting gene sequences have been integrated into the SoyBase sequence browser. SoyBase genetic maps were used in the assembly of both the Williams 82 2010 assembly (Wm82.a1.v1) and the newest genome assembly (Wm82.a2.v1). SoyBase also incorporates information about mutant and other soybean genetic stocks and serves as a contact point for ordering strains from those populations. As association analyses continue due to various re-sequencing efforts SoyBase will also incorporate those data into the soybean genome browser as they become available. Gene expression patterns are also available at SoyBase through the SoyBase expression pages and the Soybean Gene Atlas. Other expression/transcriptome/methylomic data sets also have been and continue to be incorporated into the SoyBase genome browser. Project No:3625-21000-062-00D Accession No: 0425040 Resources in this dataset:Resource Title: SoyBase, the USDA-ARS soybean genetics and genomics database web site. File Name: Web Page, url: https://soybase.org SoyBase database was established in the 1990s as the USDA Soybean Genetics Database. Originally, it contained only genetic information about soybeans such as genetic maps and information about the Mendelian genetics of soybean. In time SoyBase was expanded to include molecular data regarding soybean genes and sequences as they became available. In 2010, the soybean genome sequence was published and it and supporting gene sequences have been integrated into the SoyBase sequence browser. SoyBase genetic maps were used in the assembly of both the Williams 82 2010 assembly (Wm82.a1.v1) and the newest genome assembly (Wm82.a2.v1).

    Soybean Pods and Seeds SoyBase also incorporates information about mutant and other soybean genetic stocks and serves as a contact point for ordering strains from those populations. As association analyses continue due to various re-sequencing efforts SoyBase will also incorporate those data into the soybean genome browser as they become available. Gene expression patterns are also available at SoyBase through the SoyBase expression pages and the Soybean Gene Atlas. Other expression/transcriptome/methylomic data sets also have been and continue to be incorporated into the SoyBase genome browser.

  6. n

    T4-like genome database

    • neuinfo.org
    • rrid.site
    • +2more
    Updated Nov 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). T4-like genome database [Dataset]. http://identifiers.org/RRID:SCR_005367
    Explore at:
    Dataset updated
    Nov 1, 2025
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented August 22, 2016. A database of information on bacterial phages. It contains multiple phage genomes, which users can BLAST and MegaBLAST, and also hosts a Phage Forum in which users can discuss phage data. Interactive browsing of completed phage genomes is available using the program. The browser allows users to scan the genome for particular features and to download sequence information plus analyses of those features. Views of the genome are generated showing named genes BLAST similarities to other phages predicted tRNAs and other sequence features.

  7. ARS Microbial Genomic Sequence Database Server

    • agdatacommons.nal.usda.gov
    • catalog.data.gov
    bin
    Updated Feb 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDA Agricultural Research Service (2024). ARS Microbial Genomic Sequence Database Server [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/ARS_Microbial_Genomic_Sequence_Database_Server/24661200
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Authors
    USDA Agricultural Research Service
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    This database server is supported in fulfilment of the research mission of the Mycotoxin Prevention and Applied Microbiology Research Unit at the National Center for Agricultural Utilization Research in Peoria, Illinois. The linked website provides access to gene sequence databases for various groups of microorganisms, such as Streptomyces species or Aspergillus species and their relatives, that are the product of ARS research programs. The sequence databases are organized in the BIGSdb (Bacterial Isolate Genomic Sequence Database) software package developed by Keith Jolley and Martin Maiden at Oxford University. Resources in this dataset:Resource Title: ARS Microbial Genomic Sequence Database Server. File Name: Web Page, url: http://199.133.98.43

  8. d

    3D-Genomics Database

    • dknet.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). 3D-Genomics Database [Dataset]. http://identifiers.org/RRID:SCR_007430
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented August 29, 2016. Database containing structural annotations for the proteomes of just under 100 organisms. Using data derived from public databases of translated genomic sequences, representatives from the major branches of Life are included: Prokaryota, Eukaryota and Archaea. The annotations stored in the database may be accessed in a number of ways. The help page provides information on how to access the database. 3D-GENOMICS is now part of a larger project, called e-Protein. The project brings together similar databases at three sites: Imperial College London , University College London and the European Bioinformatics Institute . e-Protein''s mission statement is To provide a fully automated distributed pipeline for large-scale structural and functional annotation of all major proteomes via the use of cutting-edge computer GRID technologies. The following databases are incorporated: NRprot, SCOP, ASTRAL, PFAM, Prosite, taxonomy, COG The following eukaryotic genomes are incorporated: Anopheles gambiae, protein sequences from the mosquito genome; Arabidopsis thaliana, protein sequences from the Arabidopsis genome; Caenorhabditis briggsae, protein sequences from the C.briggsae genome; Caenorhabditis elegans protein sequences from the worm genome; Ciona intestinalis protein sequences from the sea squirt genome; Danio rerio protein sequences from the zebrafish genome; Drosophila melanogaster protein sequences from the fruitfly genome; Encephalitozoon cuniculi protein sequences from the E.cuniculi genome; Fugu rubripes protein sequences from the pufferfish genome; Guillardia theta protein sequences from the G.theta genome; Homo sapiens protein sequences from the human genome; Mus musculus protein sequences from the mouse genome; Neurospora crassa protein sequences from the N.crassa genome; Oryza sativa protein sequences from the rice genome; Plasmodium falciparum protein sequences from the P.falciparum genome; Rattus norvegicus protein sequences from the rat genome; Saccharomyces cerevisiae protein sequences from the yeast genome; Schizosaccharomyces pombe protein sequences from the yeast genome

  9. The results of whole genome sequence database (the TrueBacTM ID-Genome...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oh Joo Kweon; Yong Kwan Lim; Hye Ryoun Kim; Tae-Hyoung Kim; Sung-min Ha; Mi-Kyung Lee (2023). The results of whole genome sequence database (the TrueBacTM ID-Genome system) matching for the novel Cupriavidus species. [Dataset]. http://doi.org/10.1371/journal.pone.0232850.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Oh Joo Kweon; Yong Kwan Lim; Hye Ryoun Kim; Tae-Hyoung Kim; Sung-min Ha; Mi-Kyung Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The results of whole genome sequence database (the TrueBacTM ID-Genome system) matching for the novel Cupriavidus species.

  10. r

    High Throughput Genomic Sequences Division

    • rrid.site
    • scicrunch.org
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    High Throughput Genomic Sequences Division [Dataset]. http://identifiers.org/RRID:SCR_002150
    Explore at:
    Description

    Database of high-throughput genome sequences from large-scale genome sequencing centers, including unfinished and finished sequences. It was created to accommodate a growing need to make unfinished genomic sequence data rapidly available to the scientific community in a coordinated effort among the International Nucleotide Sequence databases, DDBJ, EMBL, and GenBank. Sequences are prepared for submission by using NCBI's software tools Sequin or tbl2asn. Each center has an FTP directory into which new or updated sequence files are placed. Sequence data in this division are available for BLAST homology searches against either the htgs database or the month database, which includes all new submissions for the prior month. Unfinished HTG sequences containing contigs greater than 2 kb are assigned an accession number and deposited in the HTG division. A typical HTG record might consist of all the first-pass sequence data generated from a single cosmid, BAC, YAC, or P1 clone, which together make up more than 2 kb and contain one or more gaps. A single accession number is assigned to this collection of sequences, and each record includes a clear indication of the status (phase 1 or 2) plus a prominent warning that the sequence data are unfinished and may contain errors. The accession number does not change as sequence records are updated; only the most recent version of a HTG record remains in GenBank.

  11. n

    Genome Database for Rosaceae

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jun 20, 2008
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2008). Genome Database for Rosaceae [Dataset]. http://identifiers.org/RRID:SCR_012756
    Explore at:
    Dataset updated
    Jun 20, 2008
    Description

    GDR is a curated and integrated web-based relational database. GDR contains comprehensive data of the genetically anchored peach physical map, annotated EST databases of apple, peach, almond, cherry, rose, raspberry and strawberry, Rosaceae maps and markers and all publicly available Rosaceae sequences. Annotations of ESTs include contig assembly, putative function, simple sequence repeats, ORFs, Gene Ontology and anchored position to the peach physical map where applicable. Our integrated map viewer provides graphical interface to the genetic, transcriptome and physical mapping information. We continue to add Rosaceae map data to CMap, a web-based tool that allows users to view comparisons of genetic and physical maps. ESTs, BACs and markers can be queried by various categories and the search result sites are linked to the integrated map viewer or to the WebFPC physical map sites. In addition to browsing and querying the database, users can compare their sequences with the annotated GDR sequences via a dedicated sequence similarity server running either the BLAST or FASTA algorithm, search their sequences for microsatellites using the SSR server or assemble their ESTs using the CAP3 Server.

  12. Data from: Cacao Genome Database

    • agdatacommons.nal.usda.gov
    • datasets.ai
    • +2more
    bin
    Updated Feb 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raymond J. Schnell; Alan W. Meerow; Tomas Ayala-Silva; Osman Gutierrez; David Kuhn; Cecile L. Tondo; Juan Carlos Motamayor (2024). Cacao Genome Database [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Cacao_Genome_Database/24852516
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Authors
    Raymond J. Schnell; Alan W. Meerow; Tomas Ayala-Silva; Osman Gutierrez; David Kuhn; Cecile L. Tondo; Juan Carlos Motamayor
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Not only is cacao the basic ingredient in the world’s favorite confection, chocolate, but it provides a livelihood for over 6.5 million farmers in Africa, South America and Asia and ranks as one of the top ten agriculture commodities in the world. Historically, cocoa production has been plagued by serious losses due to pests and diseases. The release of the cacao genome sequence will provide researchers with access to the latest genomic tools, enabling more efficient research and accelerating the breeding process, thereby expediting the release of superior cacao cultivars. The sequenced genotype, Matina 1-6, is representative of the genetic background most commonly found in the cacao producing countries, enabling results to be applied immediately and broadly to current commercial cultivars. Matina 1-6 is highly homozygous which greatly reduces the complexity of the sequence assembly process. While the sequence provided is a preliminary release, it already covers 92% of the genome, with approximately 35,000 genes. We will continue to refine the assembly and annotation, working toward a complete finished sequence. Updates will be made available via the main project website. Resources in this dataset:Resource Title: Cacao Genome Database. File Name: Web Page, url: http://www.cacaogenomedb.org/

  13. Data from: Pinus taeda Genome sequencing

    • agdatacommons.nal.usda.gov
    • datasetcatalog.nlm.nih.gov
    bin
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UC Davis; TreeGenes Database (2025). Pinus taeda Genome sequencing [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Pinus_taeda_Genome_sequencing/25079168
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 11, 2025
    Dataset provided by
    National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
    Authors
    UC Davis; TreeGenes Database
    License

    https://rightsstatements.org/vocab/UND/1.0/https://rightsstatements.org/vocab/UND/1.0/

    Description

    Development of a high quality reference genome sequence for loblolly pine, Douglas-fir and sugar pine by means that can serve as a model approach for sequencing other large, complex genomes and empower the forest tree biology research community and the broader biological research community in the practical use and application of this resource.

  14. Gene database of genes on different human chromosomes

    • figshare.com
    zip
    Updated Feb 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenfa Ng (2021). Gene database of genes on different human chromosomes [Dataset]. http://doi.org/10.6084/m9.figshare.13932119.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 12, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Wenfa Ng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Since the advent of the genomics age that began in the 1990s with the sequencing of a couple of model bacterial and eukaryotic genomes, humans have been on a quest to sequence many species in our ecosystems to find commonalities and differences in sequence that help explain phenotypes. This led to the field of functional genomics, and which is what gave us the capability to automatically annotate a genome with sequence homology as probe. This work sought to provide the gene database of all genes in the human genome on a granular level by categorizing the genetic repertoire of humans at the chromosomal level. Specifically, an in-house MATLAB genome analysis software was used to parse the annotated genome sequence file of different chromosomes of the human genome. Variables that have been output for each gene includes gene name, gene function, promoter sequence and gene sequence. Such information, when aggregated at the level of chromosomes, and entire genome, should inform further studies seeking to unravel the mysteries that link gene sequence, gene expression, cell differentiation, and organismal developmental trajectories and phenotypes.

  15. r

    GenBank

    • rrid.site
    • dknet.org
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). GenBank [Dataset]. http://identifiers.org/RRID:SCR_002760
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    NIH genetic sequence database that provides annotated collection of all publicly available DNA sequences for almost 280 000 formally described species (Jan 2014) .These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. It is part of International Nucleotide Sequence Database Collaboration and daily data exchange with European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through NCBI Entrez retrieval system, which integrates data from major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of GenBank database are available by FTP.

  16. n

    Animal Genome Database

    • neuinfo.org
    • rrid.site
    • +1more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Animal Genome Database [Dataset]. http://identifiers.org/RRID:SCR_008165
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database of comparative gene mapping between species to assist the mapping of the genes related to phenotypic traits in livestock. The linkage maps, cytogenetic maps, polymerase chain reaction primers of pig, cattle, mouse and human, and their references have been included in the database, and the correspondence among species have been stipulated in the database. AGP is an animal genome database developed on a Unix workstation and maintained by a relational database management system. It is a joint project of National Institute of Agrobiological Sciences (NIAS) and Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries (STAFF-Institute), under cooperation with other related research institutes. AGP also contains the Pig Expression Data Explorer (PEDE), a database of porcine EST collections derived from full-length cDNA libraries and full-length sequences of the cDNA clones picked from the EST collection. The EST sequences have been clustered and assembled, and their similarity to sequences in RefSeq, and UniGene determined. The PEDE database system was constructed to store sequences and similarity data of swine full-length cDNA libraries and to make them available to users. It provides interfaces for keyword and ID searches of BLAST results and enables users to obtain sequence data and names of clones of interest. Putative SNPs in EST assemblies have been classified according to breed specificity and their effect on coding amino acids, and the assemblies are equipped with an SNP search interface. The database contains porcine nucleotide sequences and cDNA clones that are ready for analyses such as expression in mammalian cells, because of their high likelihood of containing full-length CDS. PEDE will be useful for researchers who want to explore genes that may be responsible for traits such as disease susceptibility. The database also offers information regarding major and minor porcine-specific antigens, which might be investigated in regard to the use of pigs as models in various medical research applications.

  17. 9MM Gallus gallus protein BLAST (tabular).

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Jun 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthys G. Potgieter; Andrew J. M. Nel; Suereta Fortuin; Shaun Garnett; Jerome M. Wendoh; David L. Tabb; Nicola J. Mulder; Jonathan M. Blackburn (2023). 9MM Gallus gallus protein BLAST (tabular). [Dataset]. http://doi.org/10.1371/journal.pcbi.1011163.s014
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 29, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Matthys G. Potgieter; Andrew J. M. Nel; Suereta Fortuin; Shaun Garnett; Jerome M. Wendoh; David L. Tabb; Nicola J. Mulder; Jonathan M. Blackburn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundMicrobiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expression datasets make it hard to accurately infer microbial protein synthesis from metagenomic data. Additionally, mass spectrometry-based metaproteomic analyses typically rely on focused search sequence databases based on prior knowledge for protein identification that may not represent all the proteins present in a set of samples. Metagenomic 16S rRNA sequencing only targets the bacterial component, while whole genome sequencing is at best an indirect measure of expressed proteomes. Here we describe a novel approach, MetaNovo, that combines existing open-source software tools to perform scalable de novo sequence tag matching with a novel algorithm for probabilistic optimization of the entire UniProt knowledgebase to create tailored sequence databases for target-decoy searches directly at the proteome level, enabling metaproteomic analyses without prior expectation of sample composition or metagenomic data generation and compatible with standard downstream analysis pipelines.ResultsWe compared MetaNovo to published results from the MetaPro-IQ pipeline on 8 human mucosal-luminal interface samples, with comparable numbers of peptide and protein identifications, many shared peptide sequences and a similar bacterial taxonomic distribution compared to that found using a matched metagenome sequence database—but simultaneously identified many more non-bacterial peptides than the previous approaches. MetaNovo was also benchmarked on samples of known microbial composition against matched metagenomic and whole genomic sequence database workflows, yielding many more MS/MS identifications for the expected taxa, with improved taxonomic representation, while also highlighting previously described genome sequencing quality concerns for one of the organisms, and identifying an experimental sample contaminant without prior expectation.ConclusionsBy estimating taxonomic and peptide level information directly on microbiome samples from tandem mass spectrometry data, MetaNovo enables the simultaneous identification of peptides from all domains of life in metaproteome samples, bypassing the need for curated sequence databases to search. We show that the MetaNovo approach to mass spectrometry metaproteomics is more accurate than current gold standard approaches of tailored or matched genomic sequence database searches, can identify sample contaminants without prior expectation and yields insights into previously unidentified metaproteomic signals, building on the potential for complex mass spectrometry metaproteomic data to speak for itself.

  18. d

    Data from: Towards understanding the first genome sequence of a crenarchaeon...

    • catalog.data.gov
    • odgavaprod.ogopendata.com
    Updated Sep 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs) [Dataset]. https://catalog.data.gov/dataset/towards-understanding-the-first-genome-sequence-of-a-crenarchaeon-by-genome-annotation-usi
    Explore at:
    Dataset updated
    Sep 7, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background: Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi. Results: A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix. Conclusions: Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and predicted protein functions provide for a significant improvement in genome annotation. A differential genome display approach helps in a systematic investigation of common and distinct features of gene repertoires and in some cases reveals unexpected connections that may be indicative of functional similarities between phylogenetically distant organisms and of lateral gene exchange.

  19. r

    Sequencing of Idd regions in the NOD mouse genome

    • rrid.site
    • neuinfo.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Sequencing of Idd regions in the NOD mouse genome [Dataset]. http://identifiers.org/RRID:SCR_001483
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Genetic variations associated with type 1 diabetes identified by sequencing regions of the non-obese diabetic (NOD) mouse genome and comparing them with the same areas of a diabetes-resistant C57BL/6J reference mouse allowing identification of single nucleotide polymorphisms (SNPs) or other genomic variations putatively associated with diabetes in mice. Finished clones from the targeted insulin-dependent diabetes (Idd) candidate regions are displayed in the NOD clone sequence section of the website, where they can be downloaded either as individual clone sequences or larger contigs that make up the accession golden path (AGP). All sequences are publicly available via the International Nucleotide Sequence Database Collaboration. Two NOD mouse BAC libraries were constructed and the BAC ends sequenced. Clones from the DIL NOD BAC library constructed by RIKEN Genomic Sciences Centre (Japan) in conjunction with the Diabetes and Inflammation Laboratory (DIL) (University of Cambridge) from the NOD/MrkTac mouse strain are designated DIL. Clones from the CHORI-29 NOD BAC library constructed by Pieter de Jong (Children's Hospital, Oakland, California, USA) from the NOD/ShiLtJ mouse strain are designated CHORI-29. All NOD mouse BAC end-sequences have been submitted to the International Nucleotide Sequence Database Consortium (INSDC), deposited in the NCBI trace archive. They have generated a clone map from these two libraries by mapping the BAC end-sequences to the latest assembly of the C57BL/6J mouse reference genome sequence. These BAC end-sequence alignments can then be visualized in the Ensembl mouse genome browser where the alignments of both NOD BAC libraries can be accessed through the Distributed Annotation System (DAS). The Mouse Genomes Project has used the Illumina platform to sequence the entire NOD/ShiLtJ genome and this should help to position unaligned BAC end-sequences to novel non-reference regions of the NOD genome. Further information about the BAC end-sequences, such as their alignment, variation data and Ensembl gene coverage, can be obtained from the NOD mouse ftp site.

  20. r

    China National Center for Bioinformation Genome Sequence Archive for Human...

    • rrid.site
    • scicrunch.org
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). China National Center for Bioinformation Genome Sequence Archive for Human database [Dataset]. http://identifiers.org/RRID:SCR_027207/resolver?q=*&i=rrid
    Explore at:
    Dataset updated
    Jul 14, 2025
    Description

    Data repository for archiving raw sequence data, which provides data storage and sharing services for worldwide scientific communities. Data repository specialized for human genetic related data derived from biomedical researches.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). NCBI Genome Survey Sequences Database [Dataset]. http://identifiers.org/RRID:SCR_002146

NCBI Genome Survey Sequences Database

RRID:SCR_002146, SCR_015063, nif-0000-20938, NCBI Genome Survey Sequences Database (RRID:SCR_002146), GSS, Entrez GSS, NCBI dbGSS, dbGSS

Explore at:
5 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Aug 15, 2024
Description

Database of unannotated short single-read primarily genomic sequences from GenBank including random survey sequences clone-end sequences and exon-trapped sequences. The GSS division of GenBank is similar to the EST division, with the exception that most of the sequences are genomic in origin, rather than cDNA (mRNA). It should be noted that two classes (exon trapped products and gene trapped products) may be derived via a cDNA intermediate. Care should be taken when analyzing sequences from either of these classes, as a splicing event could have occurred and the sequence represented in the record may be interrupted when compared to genomic sequence. The GSS division contains (but is not limited to) the following types of data: * random single pass read genome survey sequences. * cosmid/BAC/YAC end sequences * exon trapped genomic sequences * Alu PCR sequences * transposon-tagged sequences Although dbGSS sequences are incorporated into the GSS Division of GenBank, annotation in dbGSS is more comprehensive and includes detailed information about the contributors, experimental conditions, and genetic map locations.

Search
Clear search
Close search
Google apps
Main menu