100+ datasets found

n
ComBase: A Combined Database For Predictive Microbiology
neuinfo.org
Updated Nov 19, 2006
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2006). ComBase: A Combined Database For Predictive Microbiology [Dataset]. http://identifiers.org/RRID:SCR_008181
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008181
Dataset updated
Nov 19, 2006
Description
A database of information about how microorganisms respond to different environments. The information in ComBase is referred to as quantitative microbiological data since it describes how levels of microorganisms, both spoilage organisms and pathogens, change over the course of time. The primary goal of the ComBase consortium is to improve efficiency in locating specific microbiological information, provide a more rapid means to compare data from different laboratories, and to reduce unnecessary redundancy in conducting microbiological studies. Cornbase was launched in 2003 The ComBase Initiative is a collaboration between the Food Standards Agency and the Institute of Food Research from the United Kingdom; the USDA Agricultural Research Service and its Eastern Regional Research Center from the United States; and the Food Safety Center in Australia. Its purpose is to make data and predictive tools on microbial responses to food environments freely available via web-based software. The ComBase Database (accessible via the ComBase Browser) consists of thousands of microbial growth and survival curves that have been collated in research establishments and from publications. They form the basis for numerous microbial models presented in ComBase Predictor, a useful tool for industry, academia and regulatory agencies. They can be used in developing new food technologies while maintaining food safety; in teaching and research; in assessing the microbial risk in foods or setting up new guidelines.
d
ARS Microbial Genomic Sequence Database Server
catalog.data.gov
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). ARS Microbial Genomic Sequence Database Server [Dataset]. https://catalog.data.gov/dataset/ars-microbial-genomic-sequence-database-server-1b81c
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description
This database server is supported in fulfilment of the research mission of the Mycotoxin Prevention and Applied Microbiology Research Unit at the National Center for Agricultural Utilization Research in Peoria, Illinois. The linked website provides access to gene sequence databases for various groups of microorganisms, such as Streptomyces species or Aspergillus species and their relatives, that are the product of ARS research programs. The sequence databases are organized in the BIGSdb (Bacterial Isolate Genomic Sequence Database) software package developed by Keith Jolley and Martin Maiden at Oxford University. Resources in this dataset: Resource Title: ARS Microbial Genomic Sequence Database Server. File Name: Web Page, url: http://199.133.98.43
n
MiST - Microbial Signal Transduction database
neuinfo.org
rrid.site
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). MiST - Microbial Signal Transduction database [Dataset]. http://identifiers.org/RRID:SCR_003166
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_003166
Dataset updated
Jan 29, 2022
Description
Database which contains the signal transduction proteins for complete and draft bacterial and archaeal genomes. The MiST2 database identifies and catalogs the repertoire of signal transduction proteins in microbial genomes.
b
Microbial Protein Interaction Database
bioregistry.io
Updated Dec 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Microbial Protein Interaction Database [Dataset]. https://bioregistry.io/registry/mpid
Explore at:
Dataset updated
Dec 18, 2021
Description
The microbial protein interaction database (MPIDB) provides physical microbial interaction data. The interactions are manually curated from the literature or imported from other databases, and are linked to supporting experimental evidence, as well as evidences based on interaction conservation, protein complex membership, and 3D domain contacts.
MARMICRODB database for taxonomic classification of (marine) metagenomes
zenodo.org
application/gzip, bin +3
Updated Mar 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shane L Hogle; Shane L Hogle (2020). MARMICRODB database for taxonomic classification of (marine) metagenomes [Dataset]. http://doi.org/10.5281/zenodo.3520509
Explore at:
bin, application/gzip, tsv, html, bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3520509
Dataset updated
Mar 20, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shane L Hogle; Shane L Hogle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction:
This sequence database (MARMICRODB) was introduced in the publication JW Becker, SL Hogle, K Rosendo, and SW Chisholm. 2019. Co-culture and biogeography of Prochlorococcus and SAR11. ISME J. doi:10.1038/s41396-019-0365-4. Please see the original publication and its associated supplementary material for the original description of this resource.

Motivation:
We needed a reference database to annotate shotgun metagenomes from the Tara Oceans project [1] the GEOTRACES cruises GA02, GA03, GA10, and GP13 and the HOT and BATS time series [2]. Our interests are primarily in quantifying and annotating the free-living, oligotrophic bacterial groups Prochlorococcus, Pelagibacterales/SAR11, SAR116, and SAR86 from these samples using the protein classifier tool Kaiju [3]. Kaiju’s sensitivity and classification accuracy depend on the composition of the reference database, and highest sensitivity is achieved when the reference database contains a comprehensive representation of expected taxa from an environment/sample of interest. However, the speed of the algorithm decreases as database size increases. Therefore, we aimed to create a reference database that maximized the representation of sequences from marine bacteria, archaea, and microbial eukaryotes, while minimizing (but not excluding) the sequences from clinical, industrial, and terrestrial host-associated samples.

Results/Description:
MARMICRODB consists of 56 million sequence non-redundant protein sequences from 18769 bacterial/archaeal/eukaryote genome and transcriptome bins and 7492 viral genomes optimized for use with the protein homology classifier Kaiju [3]. To ensure maximum representation of marine bacteria, archaea, and microbial eukaryotes, we included translated genes/transcripts from 5397 representative “specI” species clusters from the proGenomes database [4]; 113 transcriptomes from the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) [5]; 10509 metagenome assembled genomes from the Tara Oceans expedition [6,7], the Red Sea [8], the Baltic Sea [9], and other aquatic and terrestrial sources [10]; 994 isolate genomes from the Genomic Encyclopedia of Bacteria and Archaea [11]; 7492 viral genomes from NCBI RefSeq [12]; 786 bacterial and archaeal genomes from MarRef [13]; and 677 marine single cell genomes [14]. In order to annotate metagenomic reads at the clade/ecotype level (subspecies) for the focal taxa Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116, we generated custom MARMICRODB taxonomies based on curated genome phylogenies for each group. The curated phylogenies, Kaiju formatted Burrows-Wheeler index, translated genes, the custom taxonomy hierarchy, an interactive kronaplot of the taxonomic composition, and scripts and instructions for how to use or rebuild the resource is available from 10.5281/zenodo.3520509.

Methods:
The curation and quality control of MARMICRODB single cell, metagenome assembled, and isolate genomes was performed as described in [15]. Briefly, we downloaded all MARMICRODB genomes as raw nucleotide assemblies from NCBI. We determined an initial genome taxonomy for these assemblies using checkM with the default lineage workflow [16]. All genome bins met the completion/contamination thresholds outlined in prior studies [7,17]. For single cell and metagenome assembled genomes, especially those from Tara Oceans Mediterranean sea samples [18], we use the GTDB-Tk classification workflow [19] to verify the taxonomic fidelity of each genome bin. We then selected genomes with a checkM taxonomic assignment of Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116 for further analysis and confirmed taxonomic assignment using blast matches to known Prochlorococcus/Synechococcus ITS sequences and by matching 16S sequences to the SILVA database [20]. To refine our estimates of completeness/contamination of Prochlorococcus genome bins we created a custom set of 730 single copy protein families (available from 10.5281/zenodo.3719132) from closed, isolate Prochlorococcus genomes [21] for quality assessments with checkM. For Synechococcus we used the CheckM taxonomic-specific workflow with the genus Synechococcus. After the custom CheckM quality control, we excluded any genome bins from downstream analysis that had an estimated quality < 30, defined as %completeness – 5x %contamination resulting in 18769 genome/transcriptome bins. We predicted genes in the resulting genome bins using prodigal [22] and excluded protein sequences with lengths less than 20 and greater than 20000 amino acids, removed non-standard amino acid residues, and condensed redundant protein sequences to a single representative sequence to which we assigned a lowest common ancestor (LCA) taxonomy identifier from the NCBI taxonomy database [23]. The resulting protein sequences were compiled and used to build a Kaiju [3] search database.

The above filtering criteria resulted in 605 Prochlorococcus, 96 Synechococcus, 186 SAR11/Pelagibacterales, 60 SAR86, and 59 SAR116 high-quality genome bins. We constructed a high quality fixed reference phylogenetic tree for each taxonomic group based on genomes manually selected for completeness and the phylogenetic diversity. For example the Prochlorococcus and Synechococcus genomes for the fixed reference phylogeny are estimated > 90% complete, and SAR11 genomes are estimated > 70% complete. We created multiple sequence alignments of phylogenetically conserved genes from these genomes using the GTDB-Tk pipeline [19] with default settings. The pipeline identifies conserved proteins (120 bacterial proteins) and generates concatenated multi-protein alignments [17] from the genome assemblies using hmmalign from the hmmer software suite. We further filtered the resulting alignment columns using the bacterial and archaeal alignment masks from [17] (http://gtdb.ecogenomic.org/downloads). We removed columns represented by fewer than 50% of all taxa and/or columns with no single amino acid residue occuring at a frequency greater than 25%. We trimmed the alignments using trimal [24] with the automated -gappyout option to trim columns based on their gap distribution. We inferred reference phylogenies using multithreaded RAxML [25] with the GAMMA model of rate heterogeneity, empirically determined base frequencies, and the LG substitution model [26](PROTGAMMALGF). Branch support is based on 250 resampled bootstrap trees. This tree was then pruned to only allow a maximum average distance to the closest leaf (ADCL) of 0.003 to reduce the phylogenetic redundancy in the tree [27]. We then “placed” genomes that either did not pass completeness threshold or were considered phylogenetically redundant by ADCL within the fixed reference phylogeny for each group using pplacer [28] representing each placed genome as a pendant edge in the final tree. We then examined the resulting tree and manually selected clade/ecotype cutoffs to be as consistent as possible with clade definitions previously outlined for these groups [29–32]. We then gave clades from each taxonomic group custom taxonomic identifiers and we added these identifiers to the MARMICRODB Kaiju taxonomic hierarchy.

Software/databases used:
checkM v1.0.11[16]
HMMERv3.1b2 (http://hmmer.org/)
prodigal v2.6.3 [22]
trimAl v1.4.rev22 [24]
AliView v1.18.1 [33] [34]
Phyx v0.1 [35]
RAxML v8.2.12 [36]
Pplacer v1.1alpha [28]
GTDB-Tk v0.1.3 [19]
Kaiju v1.6.0 [34]
GTDB RS83 (https://data.ace.uq.edu.au/public/gtdb/data/releases/release83/83.0/)
NCBI Taxonomy (accessed 2018-07-02) [23]
TIGRFAM v14.0 [37]
PFAM v31.0 [38]

Discussion/Caveats:
MARMICRODB is optimized for metagenomic samples from the marine environment, in particular planktonic microbes from the pelagic euphotic zone. We expect this database may also be useful for classifying other types of marine metagenomic samples (for example mesopelagic, bathypelagic, or even benthic or marine host-associated), but it has not been tested as such. The original purpose of this database was to quantify clades/ecotypes of Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116 in metagenomes from Tara Oceans Expedition and the GEOTRACES project. We carefully annotated and quality controlled genomes from these five groups, but the processing of the other marine taxa was largely automated and unsupervised. Taxonomy for other groups was copied over from the Genome Taxonomy Database (GTDB) [19,39] and NCBI Taxonomy [23] so any inconsistencies in those databases will be propagated to MARMICRODB. For most use cases MARMICRODB can probably be used unmodified, but if the user’s goal is to focus on a particular organism/clade that we did not curate in the database then the user may wish to spend some time curating those genomes (ie checking for contamination, dereplicating, building a genome phylogeny for custom taxonomy node assignment). Currently the custom taxonomy is hardcoded in the MARMICRODB.fmi index, but if users wish to modify MARMICRODB by adding or removing genomes, or reconfiguring taxonomic ranks the names.dmp and nodes.dmp files can easily be modified as well as the fasta file of protein sequences. However, the Kaiju index will need to be rebuilt, and user will require a high
Data from: ComBase: A Web Resource for Quantitative and Predictive Food...
agdatacommons.nal.usda.gov
catalog.data.gov
bin
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ComBase Team (2025). ComBase: A Web Resource for Quantitative and Predictive Food Microbiology [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/ComBase_A_Web_Resource_for_Quantitative_and_Predictive_Food_Microbiology/25212404
Explore at:
binAvailable download formats
Dataset updated
Nov 21, 2025
Dataset provided by
United States Department of Agriculturehttp://usda.gov/
Agricultural Research Servicehttps://www.ars.usda.gov/
Authors
ComBase Team
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
ComBase includes a systematically formatted database of quantified microbial responses to the food environment with more than 65,000 records, and is used for:

Informing the design of food safety risk management plans Producing Food Safety Plans and HACCP plans Reducing food waste Assessing microbiological risk in foods

The ComBase Browser enables you to search thousands of microbial growth and survival curves that have been collated in research establishments and from publications. The ComBase Predictive Models are a collection of software tools based on ComBase data to predict the growth or inactivation of microorganisms as a function of environmental factors such as temperature, pH and water activity in broth. Interested users can also contribute growth or inactivation data via the Donate Data page, which includes instructional videos, data template and sample, and an Excel demo file of data and macros for checking data format and syntax. Resources in this dataset:Resource Title: Website Pointer to ComBase. File Name: Web Page, url: https://www.combase.cc/index.php/en/ ComBase is an online tool for quantitative food microbiology. Its main features are the ComBase database and ComBase models, and can be accessed on any web platform, including mobile devices. The focus of ComBase is describing and predicting how microorganisms survive and grow under a variety of primarily food-related conditions. ComBase is a highly useful tool for food companies to understand safer ways of producing and storing foods. This includes developing new food products and reformulating foods, designing challenge test protocols, producing Food Safety plans, and helping public health organizations develop science-based food policies through quantitative risk assessment. Over 60,000 records have been deposited into ComBase, describing how food environments, such as temperature, pH, and water activity, as well as other factors (e.g. preservatives and atmosphere) affect the growth of bacteria. Each data record shows users how bacteria populations change for a particular combination of environmental factors. Mathematical models (the ComBase Predictor and Food models) were developed on systematically generated data to predict how various organisms grow or survive under various conditions.
n
MBGD - Microbial Genome Database
neuinfo.org
Updated Feb 1, 2001
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2001). MBGD - Microbial Genome Database [Dataset]. http://identifiers.org/RRID:SCR_012824
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_012824
Dataset updated
Feb 1, 2001
Description
MBGD is a database for comparative analysis of completely sequenced microbial genomes, the number of which is now growing rapidly. The aim of MBGD is to facilitate comparative genomics from various points of view such as ortholog identification, paralog clustering, motif analysis and gene order comparison. The heart of MBGD function is to create orthologous or homologous gene cluster table. For this purpose, similarities between all genes are precomputed and stored into the database, in addition to the annotations of genes such as function categories that were assigned by the original authors and motifs that were found in the translated sequence. Using these homology data, MBGD dynamically creates orthologous gene cluster table. Users can change a set of organisms or cutoff parameters to create their own orthologous grouping. Based on this cluster table, users can further analyze multiple genomes from various points of view with the functions such as global map comparison, local map comparison, multiple sequence alignment and phylogenetic tree construction.
Fermented Foods Microbial Genomes Database
osti.gov
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microcosm Foods (2025). Fermented Foods Microbial Genomes Database [Dataset]. http://doi.org/10.25982/218406.47/2569606
Explore at:
Unique identifier
https://doi.org/10.25982/218406.47/2569606
Dataset updated
Jun 17, 2025
Dataset provided by
United States Department of Energyhttp://energy.gov/
Office of Sciencehttp://www.er.doe.gov/
Department of Energy Biological and Environmental Research Program
Microcosm Foods
Description
This database contains ~4,300 microbial genomes assembled from diverse fermented foods. These genomes were obtained from a larger set of 13,850 microbial genomes by clustering them at 99% average nucleotide identity (ANI) to create a "species"-representative database.
d
MPIDB
dknet.org
rrid.site
+1more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). MPIDB [Dataset]. http://identifiers.org/RRID:SCR_001898
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_001898
Dataset updated
Jan 29, 2022
Description
Database that collects and provides all known physical microbial interactions. Currently, 24,295 experimentally determined interactions among proteins of 250 bacterial species/strains can be browsed and downloaded. These microbial interactions have been manually curated from the literature or imported from other databases (IntAct, DIP, BIND, MINT) and are linked to 26,578 experimental evidences (PubMed ID, PSI-MI methods). In contrast to these databases, interactions in MPIDB are further supported by 68,346 additional evidences based on interaction conservation, co-purification, and 3D domain contacts (iPfam, 3did). (spoke/matrix) binary interactions inferred from pull-down experiments are not included.
Microbial Community Database (MiCoDa). A curated global 16S rRNA gene...
gbif.org
Updated Oct 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephanie Jurburg; Clara Arboleda-Baena; Anahita Kazem; Tobias Frøslev; Thomas Jeppesen; Stephanie Jurburg; Clara Arboleda-Baena; Anahita Kazem; Tobias Frøslev; Thomas Jeppesen (2025). Microbial Community Database (MiCoDa). A curated global 16S rRNA gene amplicon dataset from all environments [Dataset]. http://doi.org/10.15468/ver9ne
Explore at:
Unique identifier
https://doi.org/10.15468/ver9ne
Dataset updated
Oct 23, 2025
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig
Authors
Stephanie Jurburg; Clara Arboleda-Baena; Anahita Kazem; Tobias Frøslev; Thomas Jeppesen; Stephanie Jurburg; Clara Arboleda-Baena; Anahita Kazem; Tobias Frøslev; Thomas Jeppesen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2001 - Dec 31, 2023
Description
MiCoDa is a searchable database that hosts over 30,000 samples of processed 16S rRNA gene amplicon sequences from aquatic, host-associated, and mineral environments, spanning the entire globe. To improve cross-study comparability, all samples in MiCoDa have been sequenced in the same region of the 16S rRNA gene (between base pairs 515 and 806). MiCoDa also hosts the Earth Microbiome Project samples, processed in the same manner. MiCoDa is currently the largest public, human-curated microbiome database available. Its goal is to encourage the reuse of extant sequence data by specialists and non-specialists alike. To this end, we have manually curated the data and metadata included, preprocessed the sequence data to maximize comparability, and created a searchable data portal. MiCoDa is led by Dr. Stephanie Jurburg (microbial ecology), and hosted and supported by the Integrative Biodiversity Data and Code Unit of the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, the Microbial Interaction Ecology group of the Helmholtz Centre for Environmental Research- Leipzig and the FUSION group of Friedrich Schiller Universität- Jena. For more information about MiCoDA and the Data Collection, visit https://micoda.idiv.de/v1/dataCollection
[This dataset was processed using the GBIF Metabarcoding Data Toolkit.]
u
National Microbial Germplasm Program
agdatacommons.nal.usda.gov
bin
Updated Nov 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDA ARS National Germplasm Resources Laboratory (2025). National Microbial Germplasm Program [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/National_Microbial_Germplasm_Program/24661746
Explore at:
binAvailable download formats
Dataset updated
Nov 21, 2025
Dataset provided by
National Germplasm Resources Laboratory
Authors
USDA ARS National Germplasm Resources Laboratory
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The goal of the National Microbial Germplasm Program is to ensure that the genetic diversity of agriculturally important microorganisms is maintained to enhance and increase agricultural efficiency and profitability. The program collects, authenticates, and characterizes potentially useful microbial germplasm; preserves microbial genetic diversity; and facilitates distribution and utilization of microbial germplasm for research and industry.The Agricultural Research Service maintains several microbial germplasm collections including:USDA ARS Culture CollectionUSDA ARS Collection of Entomopathogenic Fungal Cultures (ARSEF)Query or Download the Rhizobium DatabaseUS National Fungus CollectionsResources in this dataset:Resource Title: National Microbial Germplasm Program .File Name: Web Page, url: https://www.ars-grin.gov/Collections#microbial-germplasm Main web site for the National Microbial Germplasm Program with links to component databases/collections.
Hybrid Sample-Matched Databases
figshare.com
txt
Updated Jan 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elliot Lee (2023). Hybrid Sample-Matched Databases [Dataset]. http://doi.org/10.6084/m9.figshare.20164415.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20164415.v1
Dataset updated
Jan 13, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Elliot Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
FASTA files for the Hybrid_Sample-Matched databases. These files are for the initial databases, which were used to perform an initial search of the data, then proteins that matched at least one spectrum were used to create refined databases.
d
Sources of microbial reference materials
search.dataone.org
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui P. A. Pereira; John Bagnoli; Lisa Karstens; Kris Locken; Katherine A. Maki; Hena Ramay; Adam Rivers; Stephanie Servetas; Kezia Valyi; Yan Wang; Meredith L. Carpenter; Denise M. O'Sullivan; Katrine L. Whiteson; Amy D. Willis (2025). Sources of microbial reference materials [Dataset]. http://doi.org/10.5061/dryad.m63xsj45z
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.m63xsj45z
Dataset updated
Jul 17, 2025
Dataset provided by
Dryad Digital Repository
Authors
Rui P. A. Pereira; John Bagnoli; Lisa Karstens; Kris Locken; Katherine A. Maki; Hena Ramay; Adam Rivers; Stephanie Servetas; Kezia Valyi; Yan Wang; Meredith L. Carpenter; Denise M. O'Sullivan; Katrine L. Whiteson; Amy D. Willis
Time period covered
Jan 1, 2023
Description
Despite the importance of the microbiome in a wide array of human and environmental health settings, challenges remain in taking accurate and precise measurements of microbial communities. Challenges in measuring microbial communities can be partially addressed through the use of "reference materials," which we interpret as any physical material that can be used for quality control, validation, diagnostics, and standardization in metagenomic, microbiome, or multi-omics workflows. As members of the International Microbiome and Multi'Omics Standards Alliance (IMMSA) Reference Materials Working Group, we collated a list of available sources of microbial reference material standards. Each entry in our list includes a description, type of material, availability, storage requirements, biosafety level, species richness, and more. Due to the geographical composition of the working group, the list of materials may be biased towards materials that are available in regions of North America and Wes..., See the README for a complete description, including information on references used to compile the spreadsheet.Â , , # Sources of microbial reference materials

https://doi.org/10.5061/dryad.m63xsj45z

This datasheet lists sources and descriptions of microbial reference material standards, which we interpret as any physical material that can be used for quality control, validation, diagnostics, and standardization in metagenomic, microbiome, or multi-omics workflows. Each entry in our list includes a description, type of material, availability, storage requirements, biosafety level, species richness, and more.

Description of the data and file structure

Our submission is a single datasheet. Each row in the datasheet corresponds to a microbial reference material standard, and each column corresponds to a specific descriptor for that standard. Cells list "unknown" (or are left blank) when it was not possible to find the needed information from publicly available provider descriptions. Information for spreadsheet v1.1 was collated in August 2022, and can be ...
M
MicrobesOnline Comparative Genomics Database
datacatalog.mskcc.org
Updated Nov 13, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Virtual Institute for Microbial Stress and Survival (2019). MicrobesOnline Comparative Genomics Database [Dataset]. https://datacatalog.mskcc.org/dataset/10391
Explore at:
Dataset updated
Nov 13, 2019
Dataset provided by
Virtual Institute for Microbial Stress and Survival
Description
The MicrobesOnline genome database contains over 1000 prokaryotic genomes. Genomes were last updated in late 2011 and no further database updates are planned.

All genomes are analyzed through the VIMSS genome pipeline. We use publicly available sequence analysis tools and databases to search for homologs (NCBI BLAST, UCSC Blat, SwissProt, COG) and protein domains (HMMer, InterPro), to assign gene ontologies (Gene Ontology Consortium) and EC numbers and to map the metabolic pathways (KEGG). We then link the orthology relationships between genes and predict operon structures.

Most genome data is downloaded from RefSeq. When an incomplete genome is directly downloaded from a sequencing center, we submit the genome sequence to RAST for automated annotation. For all genomes, we also search for CRISPR regions using PILER-CR and CRT.
In silico Database for Identification of Microorganisms by Liquid...
zenodo.org
bin
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Lasch; Peter Lasch; Andy Schneider; Chistian Blumenscheit; Joerg Doellinger; Andy Schneider; Chistian Blumenscheit; Joerg Doellinger (2020). In silico Database for Identification of Microorganisms by Liquid Chromatography-Mass Spectrometry (LC-MS1) [Dataset]. http://doi.org/10.5281/zenodo.3573996
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3573996
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Peter Lasch; Peter Lasch; Andy Schneider; Chistian Blumenscheit; Joerg Doellinger; Andy Schneider; Chistian Blumenscheit; Joerg Doellinger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Modern methods of mass spectrometry have emerged recently allowing reliable, fast and cost-effective identification of pathogenic microorganisms. For example, matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry (MS) has revolutionized the way pathogenic microorganisms are identified in today’s routine clinical microbiology. Furthermore, recent years have witnessed also substantial progress in the development of liquid chromatography-mass spectrometry (LC-MS) based proteomics for microbiological applications.

In this context, we introduce a new concept for microbial identification by mass spectrometry. The proposed approach involves efficient extraction of proteins from cultivated microbial cells, digestion by trypsin and LC-MS measurements. MS1 data are then extracted and systematically tested against in silico libraries of peptide mass data. The first version of such a database has been computed from UniProt Knowledgebase [Swiss-Prot and TrEMBL] and contains more than 12,000 strain-specific synthetic mass profiles. The database is stored in the pkf data format which is interpretable by the MicrobeMS software package (requires MicrobeMS version 0.82, or later).

For details see the following preprint: Lasch, P. Schneider, A. Blumenscheit, C. and Doellinger, J. “Identification of Microorganisms by Liquid Chromatography-Mass Spectrometry (LC-MS1) and in silico Peptide Mass Data”. bioRxiv preprint, http://dx.doi.org/10.1101/870089.
S
Medicinal Plant Microbiome Database
scidb.cn
Updated Mar 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niu Yuqing; Chen Peng (2024). Medicinal Plant Microbiome Database [Dataset]. http://doi.org/10.57760/sciencedb.17282
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.17282
Dataset updated
Mar 22, 2024
Dataset provided by
Science Data Bank
Authors
Niu Yuqing; Chen Peng
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
There are abundant endophytic bacteria, fungi and actinomycetes in medicinal plants. The microorganisms of medicinal plants are inseparable from the growth, reproduction and metabolic activities of their host plants, which can not only affect the formation and content of medicinal components of plants, but also affect the authenticity of Chinese medicinal materials. Angelicae Sinensis Radix, Astragali Radix, Codonopsis Radix, Glycyrrhizae Radix et Rhizoma and Rhei Radix et Rhizoma are traditional Chinese medicinal materials and important sources of clinical medicine. In recent years, more and more research has been done on the microbiome of this medicinal plant. In order to integrate data resources and results of numerous studies and promote comparative studies, literature review and information extraction analysis were carried out, so as to construct a knowledge base of medicinal plant microbiome to assist the research of medicinal plant quality and authenticity. The database covers medicinal plant microorganisms by name, host plant, plant source in literature, classification, genus, family, order, class, phylum, function/biological role, technique, sequence length, NCBI reference serial number /GenBank, references and corresponding links. This interface supports the query function of the microbiome content of the above medicinal plants. Therefore, the database will help to provide a research basis for the development and utilization of the microbiome of medicinal plants and provide a reference for the creation of new methods for quality control and authenticity evaluation of medicinal plants.In Version 2, an additional 11 pieces of information have been incorporated for Codonopsis Radix to consider.In Version 3, the number of endophytes in the database was updated to 350.In Version 4, in order to distinguish the origin of host plants, category 'plant source in literature' was added. Meanwhile, the names of host plants were unified as the Latin names, and one duplicate data has been removed.In Version 5, we added a processing file for the data in MPMD, in which we counted the frequency of each endophyte and analyzed parameters such as the proportion of high-frequency endophytes occurring in the five traditional medicinal plants.In Version 6, we corrected the errors that appeared in the description of the past few versions and a new version of MPMD was provided.
Genome Taxonomy Database r226.0
gbif.org
Updated Jan 28, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donovan Parks; Phil Hugenholtz; Donovan Parks; Phil Hugenholtz (2026). Genome Taxonomy Database r226.0 [Dataset]. http://doi.org/10.15468/dpzg84
Explore at:
Unique identifier
https://doi.org/10.15468/dpzg84
Dataset updated
Jan 28, 2026
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
The University of Queensland
Authors
Donovan Parks; Phil Hugenholtz; Donovan Parks; Phil Hugenholtz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Genome Taxonomy Database (GTDB) is an initiative to establish a standardised microbial taxonomy based on genome phylogeny, primarily funded by the Australian Research Council via a Laureate Fellowship (FL150100038) and Discovery Project (DP220100900), with the welcome assistance of strategic funding from The University of Queensland. The genomes used to construct the phylogeny are obtained from RefSeq and GenBank, and GTDB releases are indexed to RefSeq releases, starting with release 76. Importantly and increasingly, this dataset includes draft genomes of uncultured microorganisms obtained from metagenomes and single cells, ensuring improved genomic representation of the microbial world. All genomes are independently quality controlled using CheckM before inclusion in GTDB, see statistics here . The GTDB taxonomy is based on genome trees inferred using FastTree from an aligned concatenated set of 120 single copy marker proteins for Bacteria, and with IQ-TREE from a concatenated set of 53 (starting with R07-RS207) and 122 (prior to R07-RS207) marker proteins for Archaea (download page here ). Additional marker sets are also used to cross-validate tree topologies including concatenated ribosomal proteins and ribosomal RNA genes. NCBI taxonomy was initially used to decorate the genome tree via tax2tree and subsequently used as a reference source of new taxonomic opinions including new names. The 16S rRNA-based Greengenes and SILVA taxonomies were intially used to supplement the taxonomy particularly in regions of the tree with no cultured representatives, however genome assembly identifiers are now used to create placeholder names for uncultured taxa. LPSN is used as the primary nomenclatural reference for establishing naming priorities and nomenclature types. All taxonomic ranks except species are normalised using PhyloRank and the taxonomy manually curated to remove polyphyletic groups. Polyphyly and rank evenness can be visualised in PhyloRank plots . Species were originally delineated based on phylogeny and rank normalization but this was replaced with an ANI-based method (starting with R04-RS89) to enable scalable and automated assignment of genomes to species clusters. The GTDB taxonomy can be queried and downloaded through a number of tools at https://gtdb.ecogenomic.org/
16S_Sample-Matched Databases
figshare.com
txt
Updated Jan 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elliot Lee (2023). 16S_Sample-Matched Databases [Dataset]. http://doi.org/10.6084/m9.figshare.20164352.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20164352.v1
Dataset updated
Jan 13, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Elliot Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
FASTA files for the 16S_Sample-Matched databases. These files are for the initial database, which was used to perform an initial search of the data, then proteins that matched at least one spectrum were used to create refined 16S_Sample-Matched databases.
Data supporting publication: MiFoDB, a microbial food metagenomics reference...
zenodo.org
bin
Updated Dec 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elisa B. Caffrey; Elisa B. Caffrey; Matthew R. Olm; Matthew R. Olm; Justin L. Sonnenburg; Justin L. Sonnenburg (2023). Data supporting publication: MiFoDB, a microbial food metagenomics reference database, enables high-resolution analysis of fermented food microbial dynamics [Dataset]. http://doi.org/10.5281/zenodo.8144860
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8144860
Dataset updated
Dec 12, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Elisa B. Caffrey; Elisa B. Caffrey; Matthew R. Olm; Matthew R. Olm; Justin L. Sonnenburg; Justin L. Sonnenburg
Description
MiFoDB (Microbial Foods Database) is a primary reference database which includes 675 assembled MAGs and RefSeq bacterial, yeast, fungal, and substrate genomes from fermented foods.
Agricultural Research Service Culture Collection (NRRL - Northern Regional...
catalog.data.gov
Updated Dec 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Agricultural Research Service Culture Collection (NRRL - Northern Regional Research Laboratory) Database [Dataset]. https://catalog.data.gov/dataset/agricultural-research-service-culture-collection-nrrl-northern-regional-research-laborator-408d8
Explore at:
Dataset updated
Dec 2, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The ARS Culture Collection is one of the largest public collections of microorganisms in the world, containing approximately 93,000 strains of bacteria and fungi. The collection is split into subcollections of molds, prokaryotes, and yeasts. In addition, the online catalog is searchable by genus, species, subvar type, and subspecies. The collection is housed within the Mycotoxin Prevention and Applied Microbiology Research Unit at the National Center for Agricultural Utilization Research in Peoria, Illinois. The scientists and staff of the ARS Culture Collection conduct and facilitate microbiological research that advances agricultural production, food safety, public health, and economic development. These goals are pursued through in-house research that improves understanding and utilization of microbiological diversity and through efforts to enhance the value and accessibility of microbial accessions in the Agricultural Research Service Culture Collection. Resources in this dataset: Resource Title: The ARS Culture (NRRL) Collection Online Catalog.File Name: Web Page, url: https://nrrl.ncaur.usda.gov/ Online catalog and database server for the ARS Culture Collection (NRRL).

Facebook

Twitter

Click to copy link

Link copied

Cite

(2006). ComBase: A Combined Database For Predictive Microbiology [Dataset]. http://identifiers.org/RRID:SCR_008181

ComBase: A Combined Database For Predictive Microbiology

RRID:SCR_008181, nif-0000-21095, r3d100010878, ComBase: A Combined Database For Predictive Microbiology (RRID:SCR_008181), ComBase

Explore at:

8 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://identifiers.org/RRID:SCR_008181

Dataset updated

Nov 19, 2006

Description

A database of information about how microorganisms respond to different environments. The information in ComBase is referred to as quantitative microbiological data since it describes how levels of microorganisms, both spoilage organisms and pathogens, change over the course of time. The primary goal of the ComBase consortium is to improve efficiency in locating specific microbiological information, provide a more rapid means to compare data from different laboratories, and to reduce unnecessary redundancy in conducting microbiological studies. Cornbase was launched in 2003 The ComBase Initiative is a collaboration between the Food Standards Agency and the Institute of Food Research from the United Kingdom; the USDA Agricultural Research Service and its Eastern Regional Research Center from the United States; and the Food Safety Center in Australia. Its purpose is to make data and predictive tools on microbial responses to food environments freely available via web-based software. The ComBase Database (accessible via the ComBase Browser) consists of thousands of microbial growth and survival curves that have been collated in research establishments and from publications. They form the basis for numerous microbial models presented in ComBase Predictor, a useful tool for industry, academia and regulatory agencies. They can be used in developing new food technologies while maintaining food safety; in teaching and research; in assessing the microbial risk in foods or setting up new guidelines.

Clear search

Close search

Google apps

Main menu

ComBase: A Combined Database For Predictive Microbiology

ARS Microbial Genomic Sequence Database Server

MiST - Microbial Signal Transduction database

Microbial Protein Interaction Database

MARMICRODB database for taxonomic classification of (marine) metagenomes

Data from: ComBase: A Web Resource for Quantitative and Predictive Food...

MBGD - Microbial Genome Database

Fermented Foods Microbial Genomes Database

MPIDB

Microbial Community Database (MiCoDa). A curated global 16S rRNA gene...

National Microbial Germplasm Program

Hybrid Sample-Matched Databases

Sources of microbial reference materials

Description of the data and file structure

MicrobesOnline Comparative Genomics Database

In silico Database for Identification of Microorganisms by Liquid...

Medicinal Plant Microbiome Database

Genome Taxonomy Database r226.0

16S_Sample-Matched Databases

Data supporting publication: MiFoDB, a microbial food metagenomics reference...

Agricultural Research Service Culture Collection (NRRL - Northern Regional...

ComBase: A Combined Database For Predictive Microbiology

RRID:SCR_008181, nif-0000-21095, r3d100010878, ComBase: A Combined Database For Predictive Microbiology (RRID:SCR_008181), ComBase