100+ datasets found
  1. r

    ComBase: A Combined Database For Predictive Microbiology

    • rrid.site
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). ComBase: A Combined Database For Predictive Microbiology [Dataset]. http://identifiers.org/RRID:SCR_008181
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A database of information about how microorganisms respond to different environments. The information in ComBase is referred to as quantitative microbiological data since it describes how levels of microorganisms, both spoilage organisms and pathogens, change over the course of time. The primary goal of the ComBase consortium is to improve efficiency in locating specific microbiological information, provide a more rapid means to compare data from different laboratories, and to reduce unnecessary redundancy in conducting microbiological studies. Cornbase was launched in 2003 The ComBase Initiative is a collaboration between the Food Standards Agency and the Institute of Food Research from the United Kingdom; the USDA Agricultural Research Service and its Eastern Regional Research Center from the United States; and the Food Safety Center in Australia. Its purpose is to make data and predictive tools on microbial responses to food environments freely available via web-based software. The ComBase Database (accessible via the ComBase Browser) consists of thousands of microbial growth and survival curves that have been collated in research establishments and from publications. They form the basis for numerous microbial models presented in ComBase Predictor, a useful tool for industry, academia and regulatory agencies. They can be used in developing new food technologies while maintaining food safety; in teaching and research; in assessing the microbial risk in foods or setting up new guidelines.

  2. d

    ARS Microbial Genomic Sequence Database Server

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). ARS Microbial Genomic Sequence Database Server [Dataset]. https://catalog.data.gov/dataset/ars-microbial-genomic-sequence-database-server-1b81c
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    This database server is supported in fulfilment of the research mission of the Mycotoxin Prevention and Applied Microbiology Research Unit at the National Center for Agricultural Utilization Research in Peoria, Illinois. The linked website provides access to gene sequence databases for various groups of microorganisms, such as Streptomyces species or Aspergillus species and their relatives, that are the product of ARS research programs. The sequence databases are organized in the BIGSdb (Bacterial Isolate Genomic Sequence Database) software package developed by Keith Jolley and Martin Maiden at Oxford University. Resources in this dataset:Resource Title: ARS Microbial Genomic Sequence Database Server. File Name: Web Page, url: http://199.133.98.43

  3. Data from: Fermented Foods Microbial Genomes Database

    • zenodo.org
    application/gzip, tsv
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elizabeth McDaniel; Elizabeth McDaniel; Rachel Dutton; Rachel Dutton (2025). Fermented Foods Microbial Genomes Database [Dataset]. http://doi.org/10.5281/zenodo.15794524
    Explore at:
    application/gzip, tsvAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Elizabeth McDaniel; Elizabeth McDaniel; Rachel Dutton; Rachel Dutton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fermented Foods Microbial Genomes Database

    This database contains 13,850 microbial genomes assembled from various fermented foods and associated curated metadata.

    We have also clustered the database at 95% identity for creating a species-representative database, and at 99% identity for creating a "strain"-representative database, since we hypothesize that many bioactivities and phenotypes for fermented food microbes are important at the strain-level.

    This GitHub repository documents how publicly available genomes and metagenome-assembled genomes were sourced and curated. This GitHub repository documents how the associated metadata was curated.

    This database largely pulls from existing genome resources, and we curated this database specifically for fermented foods. If you use this database, please cite the following genome databases/resources:

    1. MiFoDB, a workflow for microbial food metagenomic characterization, enables high-resolution analysis of fermented food microbial dynamics. Elisa B. Caffrey, Matthew R. Olm, Caroline Isabel Kothe, Joshua Evans, Justin L. Sonnenburg . bioRxiv 2024.03.29.587370; doi: https://doi.org/10.1101/2024.03.29.587370
    2. Unexplored microbial diversity from 2,500 food metagenomes and links with the human microbiome. Carlino, Niccolo Alvarez-Ordonez, Avelino et al. Cell, Volume 187, Issue 20, 5775-5795.e15 AND the associated Zenodo release: Master Consortium. (2024). Unexplored microbial diversity from 2,500 food metagenomes and links with the human microbiome. Zenodo. https://doi.org/10.5281/zenodo.13285428

    We are incredibly grateful for these groups and countless others taking the time to make their data publicly available. Included in the metadata is the original DOI and study link from which the genome was generated, in addition to if they were collated into one of the above two larger databases. If you specifically use/analyze a subset of genomes, please cite those studies to credit those that generate data and make it publicly available.

    Subsetting the Database to Species/Strain-Resolved Representatives or a Custom Set

    We have provided the entire set of 13,850 microbial genomes in a single tar archive for download. We have also provided tar archives of the genomes clustered at 95% and 99% identity. If you wish to download the entire database and then only use a subset of the database, such as species-representative (clustered at 95% ANI) or "strain-representative" (clustered at 99% ANI) genomes after downloading the entire database, you can use our helper script for subsetting genomes that are representatives or a custom list that you provide.

    usage: subset_genomes.py [-h] [--rep-column {rep_95id,rep_99id}]
    [--id-column ID_COLUMN] [--dry-run]
    [--genome-list GENOME_LIST]
    metadata_tsv all_genomes_dir output_dir

    Subset representative genomes (species/strain) from a genome set using
    metadata.

    positional arguments:
    metadata_tsv Path to metadata TSV file
    all_genomes_dir Directory containing all .fa genome files
    output_dir Directory to copy representative genomes to

    optional arguments:
    -h, --help show this help message and exit
    --rep-column {rep_95id,rep_99id}
    Column in metadata to use for representatives (e.g.,
    rep_95id or rep_99id)
    --id-column ID_COLUMN
    Column in metadata with genome file IDs (default:
    mag_id)
    --dry-run Only print what would be copied, don't actually copy
    --genome-list GENOME_LIST
    Optional: Path to file with list of genome IDs or
    filenames to subset (one per line)

    KBase Fermented Foods Microbial Genomes Database Narrative

    We have uploaded the "strain-representative" set of ~4300 genomes to KBase as a public narrative.

    KBase is a community-driven platform for facilitating open science research in systems biology. KBase allows you to run bioinformatics tools on large datasets using freely available Department of Energy high-perofrmance computing resources, allowing for open-sharing of research outputs and collaborative work. You can sign-up for a KBase account with any email account. You are not required to be affiliated with an academic institution or government organization to use KBase, and you can reside outside of the United States.
    This platform not only allows additional access to the Fermented Foods Microbial Genomes Database, but access to open-source bioinformatics tools and high-performance computing resources through the DOE to run reproducible analyses. You can use this narrative for exploring the database, incorporating your own genomes to compare against genomes in the database, and/or using as a teaching resource.
  4. n

    MiST - Microbial Signal Transduction database

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). MiST - Microbial Signal Transduction database [Dataset]. http://identifiers.org/RRID:SCR_003166
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database which contains the signal transduction proteins for complete and draft bacterial and archaeal genomes. The MiST2 database identifies and catalogs the repertoire of signal transduction proteins in microbial genomes.

  5. b

    Microbial Protein Interaction Database

    • bioregistry.io
    Updated Dec 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Microbial Protein Interaction Database [Dataset]. https://bioregistry.io/registry/mpid
    Explore at:
    Dataset updated
    Dec 18, 2021
    Description

    The microbial protein interaction database (MPIDB) provides physical microbial interaction data. The interactions are manually curated from the literature or imported from other databases, and are linked to supporting experimental evidence, as well as evidences based on interaction conservation, protein complex membership, and 3D domain contacts.

  6. Data from: ComBase: A Web Resource for Quantitative and Predictive Food...

    • agdatacommons.nal.usda.gov
    • datasets.ai
    • +1more
    bin
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ComBase Team (2025). ComBase: A Web Resource for Quantitative and Predictive Food Microbiology [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/ComBase_A_Web_Resource_for_Quantitative_and_Predictive_Food_Microbiology/25212404
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Authors
    ComBase Team
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    ComBase includes a systematically formatted database of quantified microbial responses to the food environment with more than 65,000 records, and is used for:

    Informing the design of food safety risk management plans Producing Food Safety Plans and HACCP plans Reducing food waste Assessing microbiological risk in foods

    The ComBase Browser enables you to search thousands of microbial growth and survival curves that have been collated in research establishments and from publications. The ComBase Predictive Models are a collection of software tools based on ComBase data to predict the growth or inactivation of microorganisms as a function of environmental factors such as temperature, pH and water activity in broth. Interested users can also contribute growth or inactivation data via the Donate Data page, which includes instructional videos, data template and sample, and an Excel demo file of data and macros for checking data format and syntax. Resources in this dataset:Resource Title: Website Pointer to ComBase. File Name: Web Page, url: https://www.combase.cc/index.php/en/ ComBase is an online tool for quantitative food microbiology. Its main features are the ComBase database and ComBase models, and can be accessed on any web platform, including mobile devices. The focus of ComBase is describing and predicting how microorganisms survive and grow under a variety of primarily food-related conditions. ComBase is a highly useful tool for food companies to understand safer ways of producing and storing foods. This includes developing new food products and reformulating foods, designing challenge test protocols, producing Food Safety plans, and helping public health organizations develop science-based food policies through quantitative risk assessment. Over 60,000 records have been deposited into ComBase, describing how food environments, such as temperature, pH, and water activity, as well as other factors (e.g. preservatives and atmosphere) affect the growth of bacteria. Each data record shows users how bacteria populations change for a particular combination of environmental factors. Mathematical models (the ComBase Predictor and Food models) were developed on systematically generated data to predict how various organisms grow or survive under various conditions.

  7. u

    National Microbial Germplasm Program

    • agdatacommons.nal.usda.gov
    • catalog.data.gov
    bin
    Updated Nov 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDA ARS National Germplasm Resources Laboratory (2025). National Microbial Germplasm Program [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/National_Microbial_Germplasm_Program/24661746
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    National Germplasm Resources Laboratory
    Authors
    USDA ARS National Germplasm Resources Laboratory
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The goal of the National Microbial Germplasm Program is to ensure that the genetic diversity of agriculturally important microorganisms is maintained to enhance and increase agricultural efficiency and profitability. The program collects, authenticates, and characterizes potentially useful microbial germplasm; preserves microbial genetic diversity; and facilitates distribution and utilization of microbial germplasm for research and industry.The Agricultural Research Service maintains several microbial germplasm collections including:USDA ARS Culture CollectionUSDA ARS Collection of Entomopathogenic Fungal Cultures (ARSEF)Query or Download the Rhizobium DatabaseUS National Fungus CollectionsResources in this dataset:Resource Title: National Microbial Germplasm Program .File Name: Web Page, url: https://www.ars-grin.gov/Collections#microbial-germplasm Main web site for the National Microbial Germplasm Program with links to component databases/collections.

  8. RefSeq Datasets for benchmarking - Ho et al.

    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siu Fung Ho (2023). RefSeq Datasets for benchmarking - Ho et al. [Dataset]. http://doi.org/10.6084/m9.figshare.19739884.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Siu Fung Ho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All complete phage genomes deposited in RefSeq between 1 January 2020 and 12 August 2021 quality controlled, dereplicated, and fragmented to create a positive set of artificial contigs. These were then also randomly shuffled to produce a further negative control.

    All complete bacterial and archaeal chromosomes deposited in RefSeq between 1 January 2020 and 12 August 2021 quality controlled, dereplicated, subsetted, and fragmented to create a negative set of artificial contigs.

    All complete bacterial and archaeal plasmids deposited in RefSeq between 1 January 2020 and 12 August 2021 quality controlled, dereplicated, subsetted, and fragmented to create a negative set of artificial contigs.

  9. d

    Sources of microbial reference materials

    • search.dataone.org
    • datadryad.org
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui P. A. Pereira; John Bagnoli; Lisa Karstens; Kris Locken; Katherine A. Maki; Hena Ramay; Adam Rivers; Stephanie Servetas; Kezia Valyi; Yan Wang; Meredith L. Carpenter; Denise M. O'Sullivan; Katrine L. Whiteson; Amy D. Willis (2025). Sources of microbial reference materials [Dataset]. http://doi.org/10.5061/dryad.m63xsj45z
    Explore at:
    Dataset updated
    Jul 17, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Rui P. A. Pereira; John Bagnoli; Lisa Karstens; Kris Locken; Katherine A. Maki; Hena Ramay; Adam Rivers; Stephanie Servetas; Kezia Valyi; Yan Wang; Meredith L. Carpenter; Denise M. O'Sullivan; Katrine L. Whiteson; Amy D. Willis
    Time period covered
    Jan 1, 2023
    Description

    Despite the importance of the microbiome in a wide array of human and environmental health settings, challenges remain in taking accurate and precise measurements of microbial communities. Challenges in measuring microbial communities can be partially addressed through the use of "reference materials," which we interpret as any physical material that can be used for quality control, validation, diagnostics, and standardization in metagenomic, microbiome, or multi-omics workflows. As members of the International Microbiome and Multi'Omics Standards Alliance (IMMSA) Reference Materials Working Group, we collated a list of available sources of microbial reference material standards. Each entry in our list includes a description, type of material, availability, storage requirements, biosafety level, species richness, and more. Due to the geographical composition of the working group, the list of materials may be biased towards materials that are available in regions of North America and Wes..., See the README for a complete description, including information on references used to compile the spreadsheet. , , # Sources of microbial reference materials

    https://doi.org/10.5061/dryad.m63xsj45z

    This datasheet lists sources and descriptions of microbial reference material standards, which we interpret as any physical material that can be used for quality control, validation, diagnostics, and standardization in metagenomic, microbiome, or multi-omics workflows. Each entry in our list includes a description, type of material, availability, storage requirements, biosafety level, species richness, and more.

    Description of the data and file structure

    Our submission is a single datasheet. Each row in the datasheet corresponds to a microbial reference material standard, and each column corresponds to a specific descriptor for that standard. Cells list "unknown" (or are left blank) when it was not possible to find the needed information from publicly available provider descriptions. Information for spreadsheet v1.1 was collated in August 2022, and can be ...

  10. MARMICRODB database for taxonomic classification of (marine) metagenomes

    • zenodo.org
    application/gzip, bin +3
    Updated Mar 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shane L Hogle; Shane L Hogle (2020). MARMICRODB database for taxonomic classification of (marine) metagenomes [Dataset]. http://doi.org/10.5281/zenodo.3520509
    Explore at:
    bin, application/gzip, tsv, html, bz2Available download formats
    Dataset updated
    Mar 20, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shane L Hogle; Shane L Hogle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction:
    This sequence database (MARMICRODB) was introduced in the publication JW Becker, SL Hogle, K Rosendo, and SW Chisholm. 2019. Co-culture and biogeography of Prochlorococcus and SAR11. ISME J. doi:10.1038/s41396-019-0365-4. Please see the original publication and its associated supplementary material for the original description of this resource.

    Motivation:
    We needed a reference database to annotate shotgun metagenomes from the Tara Oceans project [1] the GEOTRACES cruises GA02, GA03, GA10, and GP13 and the HOT and BATS time series [2]. Our interests are primarily in quantifying and annotating the free-living, oligotrophic bacterial groups Prochlorococcus, Pelagibacterales/SAR11, SAR116, and SAR86 from these samples using the protein classifier tool Kaiju [3]. Kaiju’s sensitivity and classification accuracy depend on the composition of the reference database, and highest sensitivity is achieved when the reference database contains a comprehensive representation of expected taxa from an environment/sample of interest. However, the speed of the algorithm decreases as database size increases. Therefore, we aimed to create a reference database that maximized the representation of sequences from marine bacteria, archaea, and microbial eukaryotes, while minimizing (but not excluding) the sequences from clinical, industrial, and terrestrial host-associated samples.

    Results/Description:
    MARMICRODB consists of 56 million sequence non-redundant protein sequences from 18769 bacterial/archaeal/eukaryote genome and transcriptome bins and 7492 viral genomes optimized for use with the protein homology classifier Kaiju [3]. To ensure maximum representation of marine bacteria, archaea, and microbial eukaryotes, we included translated genes/transcripts from 5397 representative “specI” species clusters from the proGenomes database [4]; 113 transcriptomes from the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) [5]; 10509 metagenome assembled genomes from the Tara Oceans expedition [6,7], the Red Sea [8], the Baltic Sea [9], and other aquatic and terrestrial sources [10]; 994 isolate genomes from the Genomic Encyclopedia of Bacteria and Archaea [11]; 7492 viral genomes from NCBI RefSeq [12]; 786 bacterial and archaeal genomes from MarRef [13]; and 677 marine single cell genomes [14]. In order to annotate metagenomic reads at the clade/ecotype level (subspecies) for the focal taxa Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116, we generated custom MARMICRODB taxonomies based on curated genome phylogenies for each group. The curated phylogenies, Kaiju formatted Burrows-Wheeler index, translated genes, the custom taxonomy hierarchy, an interactive kronaplot of the taxonomic composition, and scripts and instructions for how to use or rebuild the resource is available from 10.5281/zenodo.3520509.

    Methods:
    The curation and quality control of MARMICRODB single cell, metagenome assembled, and isolate genomes was performed as described in [15]. Briefly, we downloaded all MARMICRODB genomes as raw nucleotide assemblies from NCBI. We determined an initial genome taxonomy for these assemblies using checkM with the default lineage workflow [16]. All genome bins met the completion/contamination thresholds outlined in prior studies [7,17]. For single cell and metagenome assembled genomes, especially those from Tara Oceans Mediterranean sea samples [18], we use the GTDB-Tk classification workflow [19] to verify the taxonomic fidelity of each genome bin. We then selected genomes with a checkM taxonomic assignment of Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116 for further analysis and confirmed taxonomic assignment using blast matches to known Prochlorococcus/Synechococcus ITS sequences and by matching 16S sequences to the SILVA database [20]. To refine our estimates of completeness/contamination of Prochlorococcus genome bins we created a custom set of 730 single copy protein families (available from 10.5281/zenodo.3719132) from closed, isolate Prochlorococcus genomes [21] for quality assessments with checkM. For Synechococcus we used the CheckM taxonomic-specific workflow with the genus Synechococcus. After the custom CheckM quality control, we excluded any genome bins from downstream analysis that had an estimated quality < 30, defined as %completeness – 5x %contamination resulting in 18769 genome/transcriptome bins. We predicted genes in the resulting genome bins using prodigal [22] and excluded protein sequences with lengths less than 20 and greater than 20000 amino acids, removed non-standard amino acid residues, and condensed redundant protein sequences to a single representative sequence to which we assigned a lowest common ancestor (LCA) taxonomy identifier from the NCBI taxonomy database [23]. The resulting protein sequences were compiled and used to build a Kaiju [3] search database.

    The above filtering criteria resulted in 605 Prochlorococcus, 96 Synechococcus, 186 SAR11/Pelagibacterales, 60 SAR86, and 59 SAR116 high-quality genome bins. We constructed a high quality fixed reference phylogenetic tree for each taxonomic group based on genomes manually selected for completeness and the phylogenetic diversity. For example the Prochlorococcus and Synechococcus genomes for the fixed reference phylogeny are estimated > 90% complete, and SAR11 genomes are estimated > 70% complete. We created multiple sequence alignments of phylogenetically conserved genes from these genomes using the GTDB-Tk pipeline [19] with default settings. The pipeline identifies conserved proteins (120 bacterial proteins) and generates concatenated multi-protein alignments [17] from the genome assemblies using hmmalign from the hmmer software suite. We further filtered the resulting alignment columns using the bacterial and archaeal alignment masks from [17] (http://gtdb.ecogenomic.org/downloads). We removed columns represented by fewer than 50% of all taxa and/or columns with no single amino acid residue occuring at a frequency greater than 25%. We trimmed the alignments using trimal [24] with the automated -gappyout option to trim columns based on their gap distribution. We inferred reference phylogenies using multithreaded RAxML [25] with the GAMMA model of rate heterogeneity, empirically determined base frequencies, and the LG substitution model [26](PROTGAMMALGF). Branch support is based on 250 resampled bootstrap trees. This tree was then pruned to only allow a maximum average distance to the closest leaf (ADCL) of 0.003 to reduce the phylogenetic redundancy in the tree [27]. We then “placed” genomes that either did not pass completeness threshold or were considered phylogenetically redundant by ADCL within the fixed reference phylogeny for each group using pplacer [28] representing each placed genome as a pendant edge in the final tree. We then examined the resulting tree and manually selected clade/ecotype cutoffs to be as consistent as possible with clade definitions previously outlined for these groups [29–32]. We then gave clades from each taxonomic group custom taxonomic identifiers and we added these identifiers to the MARMICRODB Kaiju taxonomic hierarchy.

    Software/databases used:
    checkM v1.0.11[16]
    HMMERv3.1b2 (http://hmmer.org/)
    prodigal v2.6.3 [22]
    trimAl v1.4.rev22 [24]
    AliView v1.18.1 [33] [34]
    Phyx v0.1 [35]
    RAxML v8.2.12 [36]
    Pplacer v1.1alpha [28]
    GTDB-Tk v0.1.3 [19]
    Kaiju v1.6.0 [34]
    GTDB RS83 (https://data.ace.uq.edu.au/public/gtdb/data/releases/release83/83.0/)
    NCBI Taxonomy (accessed 2018-07-02) [23]
    TIGRFAM v14.0 [37]
    PFAM v31.0 [38]

    Discussion/Caveats:
    MARMICRODB is optimized for metagenomic samples from the marine environment, in particular planktonic microbes from the pelagic euphotic zone. We expect this database may also be useful for classifying other types of marine metagenomic samples (for example mesopelagic, bathypelagic, or even benthic or marine host-associated), but it has not been tested as such. The original purpose of this database was to quantify clades/ecotypes of Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116 in metagenomes from Tara Oceans Expedition and the GEOTRACES project. We carefully annotated and quality controlled genomes from these five groups, but the processing of the other marine taxa was largely automated and unsupervised. Taxonomy for other groups was copied over from the Genome Taxonomy Database (GTDB) [19,39] and NCBI Taxonomy [23] so any inconsistencies in those databases will be propagated to MARMICRODB. For most use cases MARMICRODB can probably be used unmodified, but if the user’s goal is to focus on a particular organism/clade that we did not curate in the database then the user may wish to spend some time curating those genomes (ie checking for contamination, dereplicating, building a genome phylogeny for custom taxonomy node assignment). Currently the custom taxonomy is hardcoded in the MARMICRODB.fmi index, but if users wish to modify MARMICRODB by adding or removing genomes, or reconfiguring taxonomic ranks the names.dmp and nodes.dmp files can easily be modified as well as the fasta file of protein sequences. However, the Kaiju index will need to be rebuilt, and user will require a high

  11. f

    Analytical tools and databases that use predictive microbiology to support...

    • figshare.com
    xls
    Updated Aug 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joan B. Rose; Nynke Hofstra; Erica Hollmann; Panagis Katsivelis; Gertjan J. Medema; Heather M. Murphy; Colleen C. Naughton; Matthew E. Verbyla (2023). Analytical tools and databases that use predictive microbiology to support safe water and/or safe sanitation systems. [Dataset]. http://doi.org/10.1371/journal.pwat.0000166.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 23, 2023
    Dataset provided by
    PLOS Water
    Authors
    Joan B. Rose; Nynke Hofstra; Erica Hollmann; Panagis Katsivelis; Gertjan J. Medema; Heather M. Murphy; Colleen C. Naughton; Matthew E. Verbyla
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analytical tools and databases that use predictive microbiology to support safe water and/or safe sanitation systems.

  12. d

    MPIDB

    • dknet.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). MPIDB [Dataset]. http://identifiers.org/RRID:SCR_001898
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database that collects and provides all known physical microbial interactions. Currently, 24,295 experimentally determined interactions among proteins of 250 bacterial species/strains can be browsed and downloaded. These microbial interactions have been manually curated from the literature or imported from other databases (IntAct, DIP, BIND, MINT) and are linked to 26,578 experimental evidences (PubMed ID, PSI-MI methods). In contrast to these databases, interactions in MPIDB are further supported by 68,346 additional evidences based on interaction conservation, co-purification, and 3D domain contacts (iPfam, 3did). (spoke/matrix) binary interactions inferred from pull-down experiments are not included.

  13. d

    MBGD - Microbial Genome Database

    • dknet.org
    • scicrunch.org
    • +1more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). MBGD - Microbial Genome Database [Dataset]. http://identifiers.org/RRID:SCR_012824/resolver
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    MBGD is a database for comparative analysis of completely sequenced microbial genomes, the number of which is now growing rapidly. The aim of MBGD is to facilitate comparative genomics from various points of view such as ortholog identification, paralog clustering, motif analysis and gene order comparison. The heart of MBGD function is to create orthologous or homologous gene cluster table. For this purpose, similarities between all genes are precomputed and stored into the database, in addition to the annotations of genes such as function categories that were assigned by the original authors and motifs that were found in the translated sequence. Using these homology data, MBGD dynamically creates orthologous gene cluster table. Users can change a set of organisms or cutoff parameters to create their own orthologous grouping. Based on this cluster table, users can further analyze multiple genomes from various points of view with the functions such as global map comparison, local map comparison, multiple sequence alignment and phylogenetic tree construction.

  14. b

    Database of Anti-Microbial Peptides

    • bioregistry.io
    Updated Oct 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Database of Anti-Microbial Peptides [Dataset]. https://bioregistry.io/registry/dbamp
    Explore at:
    Dataset updated
    Oct 20, 2025
    Description

    Identifiers represent antimicrobial peptides in the Database of Antimicrobial Peptides (dbAMP) which is an open-access, manually curated database of antimicrobial peptides (AMPs).

  15. In silico Database for Identification of Microorganisms by Liquid...

    • zenodo.org
    bin
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Lasch; Peter Lasch; Andy Schneider; Chistian Blumenscheit; Joerg Doellinger; Andy Schneider; Chistian Blumenscheit; Joerg Doellinger (2020). In silico Database for Identification of Microorganisms by Liquid Chromatography-Mass Spectrometry (LC-MS1) [Dataset]. http://doi.org/10.5281/zenodo.3573996
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Peter Lasch; Peter Lasch; Andy Schneider; Chistian Blumenscheit; Joerg Doellinger; Andy Schneider; Chistian Blumenscheit; Joerg Doellinger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Modern methods of mass spectrometry have emerged recently allowing reliable, fast and cost-effective identification of pathogenic microorganisms. For example, matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry (MS) has revolutionized the way pathogenic microorganisms are identified in today’s routine clinical microbiology. Furthermore, recent years have witnessed also substantial progress in the development of liquid chromatography-mass spectrometry (LC-MS) based proteomics for microbiological applications.

    In this context, we introduce a new concept for microbial identification by mass spectrometry. The proposed approach involves efficient extraction of proteins from cultivated microbial cells, digestion by trypsin and LC-MS measurements. MS1 data are then extracted and systematically tested against in silico libraries of peptide mass data. The first version of such a database has been computed from UniProt Knowledgebase [Swiss-Prot and TrEMBL] and contains more than 12,000 strain-specific synthetic mass profiles. The database is stored in the pkf data format which is interpretable by the MicrobeMS software package (requires MicrobeMS version 0.82, or later).

    For details see the following preprint: Lasch, P. Schneider, A. Blumenscheit, C. and Doellinger, J. “Identification of Microorganisms by Liquid Chromatography-Mass Spectrometry (LC-MS1) and in silico Peptide Mass Data”. bioRxiv preprint, http://dx.doi.org/10.1101/870089.

  16. Fermented Foods Microbial Genomes Database

    • osti.gov
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDOE Office of Science (SC), Biological and Environmental Research (BER) (2025). Fermented Foods Microbial Genomes Database [Dataset]. http://doi.org/10.25982/218406.47/2569606
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    Office of Sciencehttp://www.er.doe.gov/
    Department of Energy Biological and Environmental Research Program
    Microcosm Foods
    Description

    This database contains ~4,300 microbial genomes assembled from diverse fermented foods. These genomes were obtained from a larger set of 13,850 microbial genomes by clustering them at 99% average nucleotide identity (ANI) to create a "species"-representative database.

  17. Version 4 (20230306) of the MALDI-ToF Mass Spectrometry Database for...

    • data.niaid.nih.gov
    Updated Dec 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lasch, Peter; Stämmler, Maren; Schneider, Andy (2024). Version 4 (20230306) of the MALDI-ToF Mass Spectrometry Database for Identification and Classification of Highly Pathogenic Microorganisms from the Robert Koch-Institute (RKI) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7702374
    Explore at:
    Dataset updated
    Dec 27, 2024
    Dataset provided by
    Robert Koch Institutehttps://www.rki.de/
    Authors
    Lasch, Peter; Stämmler, Maren; Schneider, Andy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    (Version 20230306)

    Version 4 (20230306) of the RKI MALDI-ToF mass spectra database is the third update of the original database (version 20161027, https://doi.org/10.5281/zenodo.163517). The RKI Database v.4 now contains a total of 11055 MALDI-ToF mass spectra from 1599 microbial strains of highly pathogenic (i.e. biosafety level 3, BSL-3) bacteria such as Bacillus anthracis, Brucella melitensis, Yersinia pestis, Burkholderia mallei / pseudomallei and Francisella tularensis as well as a selection of spectra of their close and distant relatives. The database can be used as a reference for the diagnosis of BSL-3 bacteria using proprietary and free software packages for MALDI-ToF MS-based microbial identification. The spectral data are provided as a zip archive (zenodo db 230306.zip) containing the original mass spectra in their native data format (Bruker Daltonics). Please refer to the pdf file (230306-ZENODO-Metadata.pdf) for information on cultivation conditions, sample preparation and details of the spectra acquisition. Please do not try to print this document (>1600 pages!).

    Version 20230306 of the RKI database contains for the first time a file in btmsp format (230306_v4_RKI_DB_BSL3.btmsp). This file was generated using the MALDI Biotyper software (Bruker Daltonics) and contains a total of 1599 main spectra from the BSL-3 database in the proprietary data format of the MALDI Biotyper software. *.btmsp files can be imported and used for identification with this software solution. Note that the btmsp file available in database version 4 is broken and cannot be imported. Please refer to updated database versions (4.1, or 4.2) to download valid btmsp files.

    The pkf files (230306_ZENODO_30Peaks_0.75.pkf, 230306_ZENODO_45Peaks_0.75.pkf) represent two versions of the MS peak list data in a Matlab compatible format. The latter data can be imported into MicrobeMS, a free Matlab-based software solution developed at the RKI. MicrobeMS can be used for the identification of microorganisms by MALDI-ToF MS and is available at https://wiki-ms.microbe-ms.com.

    The RKI mass spectrometry database is updated regularly.

    The author would like to thank the following individuals for providing microbial strains and species or mass spectra thereof. Without their help, this work would not have been possible.

    Wolfgang Beyer - University of Hohenheim, Faculty of Agricultural Sciences, Stuttgart, Germany

    Guido Werner - Robert Koch-Institute, Nosocomial Pathogens and Antibiotic Resistances (FG13), Wernigerode, Germany

    Alejandra Bosch - CINDEFI, CONICET-CCT La Plata, Facultad de Ciencias Exactas, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina

    Michal Drevinek - National Institute for Nuclear, Biological and Chemical Protection, Milin, Czech Republic

    Roland Grunow, Daniela Jacob, Silke Klee, Susann Dupke and Holger Scholz - Robert Koch-Institute, Highly Pathogenic Microorganisms (ZBS2), Berlin, Germany

    Jörg Rau - Chemisches und Veterinäruntersuchungsamt Stuttgart, Fellbach, Germany

    Jens Jacob - Robert Koch-Institute, Hospital Hygiene, Infection Prevention and Control (FG14), Berlin, Germany

    Martin Mielke - Robert Koch-Institute, Department 1 - Infectious Diseases, Berlin, Germany

    Monika Ehling-Schulz - Functional Microbiology, Institute of Microbiology, University of Veterinary Medicine, Vienna, Austria

    Armand Paauw - Department of Medical Microbiology, CBRN protection, Universitair Medisch Centrum Utrecht, TNO, Rijswijk, The Netherlands

    Herbert Tomaso – Friedrich-Löffler-Institut (FLI), Federal Research Institute for Animal Health, Jena, Germany

    Gabriel Karner - Karner Düngerproduktion GmbH, Research & Development, Neulengbach, Austria

    Rainer Borriss - Institute of Marine Biotechnology e.V. (IMaB), Greifswald, Germany

    Le Thi Thanh Tam - Division of Plant Pathology and Phyto-Immunology, Plant Protection Research Institute, Hanoi, Socialist Republic of Vietnam

    Xuewen Gao - College of Plant Protection, Nanjing Agricultural University, Key Laboratory of Integrated Management of Crop Diseases and Pests, Nanjing, People’s Republic of China

  18. Agricultural Research Service Culture Collection (NRRL - Northern Regional...

    • agdatacommons.nal.usda.gov
    • catalog.data.gov
    • +1more
    bin
    Updated Nov 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDA/ARS, Mycotoxin Prevention and Applied Microbiology (2025). Agricultural Research Service Culture Collection (NRRL - Northern Regional Research Laboratory) Database [Dataset]. http://doi.org/10.15482/USDA.ADC/1327001
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 22, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Authors
    USDA/ARS, Mycotoxin Prevention and Applied Microbiology
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The ARS Culture Collection is one of the largest public collections of microorganisms in the world, containing approximately 93,000 strains of bacteria and fungi. The collection is split into subcollections of molds, prokaryotes, and yeasts. In addition, the online catalog is searchable by genus, species, subvar type, and subspecies.The collection is housed within the Mycotoxin Prevention and Applied Microbiology Research Unit at the National Center for Agricultural Utilization Research in Peoria, Illinois. The scientists and staff of the ARS Culture Collection conduct and facilitate microbiological research that advances agricultural production, food safety, public health, and economic development. These goals are pursued through in-house research that improves understanding and utilization of microbiological diversity and through efforts to enhance the value and accessibility of microbial accessions in the Agricultural Research Service Culture Collection.Resources in this dataset:Resource Title: The ARS Culture (NRRL) Collection Online Catalog.File Name: Web Page, url: https://nrrl.ncaur.usda.gov/ Online catalog and database server for the ARS Culture Collection (NRRL).

  19. e

    Data from: Cogeme Phytopathogenic Fungi and Oomycete EST Database

    • ore.exeter.ac.uk
    application/x-gzip
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darren Soanes; Nicholas J. Talbot (2025). Cogeme Phytopathogenic Fungi and Oomycete EST Database [Dataset]. https://ore.exeter.ac.uk/articles/dataset/Cogeme_Phytopathogenic_Fungi_and_Oomycete_EST_Database/29673914
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Jul 30, 2025
    Dataset provided by
    University of Exeter
    Authors
    Darren Soanes; Nicholas J. Talbot
    License

    https://www.rioxx.net/licenses/all-rights-reservedhttps://www.rioxx.net/licenses/all-rights-reserved

    Description

    Expressed sequence tags (ESTs) have been obtained from eighteen species of plant pathogenic fungi, two species of phytopathogenic oomycete and three species of saprophytic fungi. Hierarchical clustering software was used to classify together ESTs representing the same gene and produce a single contig, or consensus sequence. The unisequence set for each pathogen therefore represents a set of unique gene sequences, each one consisting of either a single EST or a contig sequence made from a group of ESTs. Unisequences were annotated based on top hits against the NCBI non-redundant protein database using blastx.

  20. Bacterial strain panel used in this study.

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erin P. Price; Derek S. Sarovich; Jessica R. Webb; Jennifer L. Ginther; Mark Mayo; James M. Cook; Meagan L. Seymour; Mirjam Kaestli; Vanessa Theobald; Carina M. Hall; Joseph D. Busch; Jeffrey T. Foster; Paul Keim; David M. Wagner; Apichai Tuanyok; Talima Pearson; Bart J. Currie (2023). Bacterial strain panel used in this study. [Dataset]. http://doi.org/10.1371/journal.pone.0071647.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Erin P. Price; Derek S. Sarovich; Jessica R. Webb; Jennifer L. Ginther; Mark Mayo; James M. Cook; Meagan L. Seymour; Mirjam Kaestli; Vanessa Theobald; Carina M. Hall; Joseph D. Busch; Jeffrey T. Foster; Paul Keim; David M. Wagner; Apichai Tuanyok; Talima Pearson; Bart J. Currie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    aNumbers in parentheses indicate Thai strains; all other strains were isolated in the Northern Territory, Australia.bSpecies assignment based on [28].

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2022). ComBase: A Combined Database For Predictive Microbiology [Dataset]. http://identifiers.org/RRID:SCR_008181

ComBase: A Combined Database For Predictive Microbiology

RRID:SCR_008181, nif-0000-21095, r3d100010878, ComBase: A Combined Database For Predictive Microbiology (RRID:SCR_008181), ComBase

Explore at:
9 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jan 29, 2022
Description

A database of information about how microorganisms respond to different environments. The information in ComBase is referred to as quantitative microbiological data since it describes how levels of microorganisms, both spoilage organisms and pathogens, change over the course of time. The primary goal of the ComBase consortium is to improve efficiency in locating specific microbiological information, provide a more rapid means to compare data from different laboratories, and to reduce unnecessary redundancy in conducting microbiological studies. Cornbase was launched in 2003 The ComBase Initiative is a collaboration between the Food Standards Agency and the Institute of Food Research from the United Kingdom; the USDA Agricultural Research Service and its Eastern Regional Research Center from the United States; and the Food Safety Center in Australia. Its purpose is to make data and predictive tools on microbial responses to food environments freely available via web-based software. The ComBase Database (accessible via the ComBase Browser) consists of thousands of microbial growth and survival curves that have been collated in research establishments and from publications. They form the basis for numerous microbial models presented in ComBase Predictor, a useful tool for industry, academia and regulatory agencies. They can be used in developing new food technologies while maintaining food safety; in teaching and research; in assessing the microbial risk in foods or setting up new guidelines.

Search
Clear search
Close search
Google apps
Main menu