100+ datasets found

r
ComBase: A Combined Database For Predictive Microbiology
rrid.site
scicrunch.org
+2more
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ComBase: A Combined Database For Predictive Microbiology [Dataset]. http://identifiers.org/RRID:SCR_008181
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008181
Dataset updated
Jun 17, 2025
Description
A database of information about how microorganisms respond to different environments. The information in ComBase is referred to as quantitative microbiological data since it describes how levels of microorganisms, both spoilage organisms and pathogens, change over the course of time. The primary goal of the ComBase consortium is to improve efficiency in locating specific microbiological information, provide a more rapid means to compare data from different laboratories, and to reduce unnecessary redundancy in conducting microbiological studies. Cornbase was launched in 2003 The ComBase Initiative is a collaboration between the Food Standards Agency and the Institute of Food Research from the United Kingdom; the USDA Agricultural Research Service and its Eastern Regional Research Center from the United States; and the Food Safety Center in Australia. Its purpose is to make data and predictive tools on microbial responses to food environments freely available via web-based software. The ComBase Database (accessible via the ComBase Browser) consists of thousands of microbial growth and survival curves that have been collated in research establishments and from publications. They form the basis for numerous microbial models presented in ComBase Predictor, a useful tool for industry, academia and regulatory agencies. They can be used in developing new food technologies while maintaining food safety; in teaching and research; in assessing the microbial risk in foods or setting up new guidelines.
Predictive Microbiology Information Portal (PMIP)
agdatacommons.nal.usda.gov
bin
Updated Feb 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDA Food Safety & Inspection Service (FSIS), USDA Agricultural Research Service (ARS) (2024). Predictive Microbiology Information Portal (PMIP) [Dataset]. http://doi.org/10.15482/USDA.ADC/1178077
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1178077
Dataset updated
Feb 8, 2024
Dataset provided by
United States Department of Agriculturehttp://usda.gov/
Food Safety and Inspection Servicehttp://www.fsis.usda.gov/
Authors
USDA Food Safety & Inspection Service (FSIS), USDA Agricultural Research Service (ARS)
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
PMIP provides access to predictive models for foodborne pathogens, relevant regulatory policies and guidelines, and microbial data related to pathogenic and spoilage microorganisms in food products. The models in the Predictive Microbiology Information Portal are mainly from the Pathogen Modeling Program (PMP) of the USDA Agricultural Research Service (ARS) - Eastern Regional Research Center (ERRC) , and currently 15 models are in the portal. The main sources of rules/regulations are links to the USDA - Food Safety and Inspection Service and Food and Drug Administration websites. The microbial growth data are from Combase, which contains about 65,000 data points. It is a relational database that is jointly developed and maintained by the Food Research Institute of UK, USDA-ARS, and the Center of Excellence for Food Safety, Australia. The portal provides a searchable function that the users can use to obtain specific information that is of interest to them. The tutorial provides brief instructions and examples on how to navigate the portal and retrieve necessary information. The PMIP links users to numerous and diverse resources associated with models (PMP), databases (ComBase) , regulatory requirements, and food safety principles. Resources in this dataset:Resource Title: Predictive Microbiology Information Portal (PMIP) Web Site. File Name: Web Page, url: https://portal.errc.ars.usda.gov/
i
Microbial Protein Interaction Database
registry.identifiers.org
bioregistry.io
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Microbial Protein Interaction Database [Dataset]. https://registry.identifiers.org/registry/mpid
Explore at:
Dataset updated
May 23, 2025
Description
The microbial protein interaction database (MPIDB) provides physical microbial interaction data. The interactions are manually curated from the literature or imported from other databases, and are linked to supporting experimental evidence, as well as evidences based on interaction conservation, protein complex membership, and 3D domain contacts.
d
ARS Microbial Genomic Sequence Database Server.
datadiscoverystudio.org
agdatacommons.nal.usda.gov
+2more
Updated Feb 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). ARS Microbial Genomic Sequence Database Server. [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/a4a655799e5e43f8a894cb618456aa07/html
Explore at:
Dataset updated
Feb 4, 2018
Description
description:
This database server is supported in fulfilment of the research mission of the Mycotoxin Prevention and Applied Microbiology Research Unit at the National Center for Agricultural Utilization Research in Peoria, Illinois. The linked website provides access to gene sequence databases for various groups of microorganisms, such as Streptomyces species or Aspergillus species and their relatives, that are the product of ARS research programs. The sequence databases are organized in the BIGSdb (Bacterial Isolate Genomic Sequence Database) software package developed by Keith Jolley and Martin Maiden at Oxford University.
; abstract:
This database server is supported in fulfilment of the research mission of the Mycotoxin Prevention and Applied Microbiology Research Unit at the National Center for Agricultural Utilization Research in Peoria, Illinois. The linked website provides access to gene sequence databases for various groups of microorganisms, such as Streptomyces species or Aspergillus species and their relatives, that are the product of ARS research programs. The sequence databases are organized in the BIGSdb (Bacterial Isolate Genomic Sequence Database) software package developed by Keith Jolley and Martin Maiden at Oxford University.
d
Data from: ComBase: A Web Resource for Quantitative and Predictive Food...
catalog.data.gov
datasets.ai
+1more
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). ComBase: A Web Resource for Quantitative and Predictive Food Microbiology [Dataset]. https://catalog.data.gov/dataset/combase-a-web-resource-for-quantitative-and-predictive-food-microbiology-d652f
Explore at:
Dataset updated
Jun 5, 2025
Dataset provided by
Agricultural Research Service
Description
ComBase includes a systematically formatted database of quantified microbial responses to the food environment with more than 65,000 records, and is used for: Informing the design of food safety risk management plans Producing Food Safety Plans and HACCP plans Reducing food waste Assessing microbiological risk in foods The ComBase Browser enables you to search thousands of microbial growth and survival curves that have been collated in research establishments and from publications. The ComBase Predictive Models are a collection of software tools based on ComBase data to predict the growth or inactivation of microorganisms as a function of environmental factors such as temperature, pH and water activity in broth. Interested users can also contribute growth or inactivation data via the Donate Data page, which includes instructional videos, data template and sample, and an Excel demo file of data and macros for checking data format and syntax. Resources in this dataset:Resource Title: Website Pointer to ComBase. File Name: Web Page, url: https://www.combase.cc/index.php/en/ ComBase is an online tool for quantitative food microbiology. Its main features are the ComBase database and ComBase models, and can be accessed on any web platform, including mobile devices. The focus of ComBase is describing and predicting how microorganisms survive and grow under a variety of primarily food-related conditions. ComBase is a highly useful tool for food companies to understand safer ways of producing and storing foods. This includes developing new food products and reformulating foods, designing challenge test protocols, producing Food Safety plans, and helping public health organizations develop science-based food policies through quantitative risk assessment. Over 60,000 records have been deposited into ComBase, describing how food environments, such as temperature, pH, and water activity, as well as other factors (e.g. preservatives and atmosphere) affect the growth of bacteria. Each data record shows users how bacteria populations change for a particular combination of environmental factors. Mathematical models (the ComBase Predictor and Food models) were developed on systematically generated data to predict how various organisms grow or survive under various conditions.
f
NEMiD: A Web-Based Curated Microbial Diversity Database with Geo-Based...
plos.figshare.com
tiff
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaushik Bhattacharjee; Santa Ram Joshi (2023). NEMiD: A Web-Based Curated Microbial Diversity Database with Geo-Based Plotting [Dataset]. http://doi.org/10.1371/journal.pone.0094088
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0094088
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Kaushik Bhattacharjee; Santa Ram Joshi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The majority of the Earth's microbes remain unknown, and that their potential utility cannot be exploited until they are discovered and characterized. They provide wide scope for the development of new strains as well as biotechnological uses. The documentation and bioprospection of microorganisms carry enormous significance considering their relevance to human welfare. This calls for an urgent need to develop a database with emphasis on the microbial diversity of the largest untapped reservoirs in the biosphere. The data annotated in the North-East India Microbial database (NEMiD) were obtained by the isolation and characterization of microbes from different parts of the Eastern Himalayan region. The database was constructed as a relational database management system (RDBMS) for data storage in MySQL in the back-end on a Linux server and implemented in an Apache/PHP environment. This database provides a base for understanding the soil microbial diversity pattern in this megabiodiversity hotspot and indicates the distribution patterns of various organisms along with identification. The NEMiD database is freely available at www.mblabnehu.info/nemid/.
q
Microbiology Data Problems 2023
qubeshub.org
Updated Feb 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charles Deutch (2023). Microbiology Data Problems 2023 [Dataset]. http://doi.org/10.25334/DS8P-7093
Explore at:
Unique identifier
https://doi.org/10.25334/DS8P-7093
Dataset updated
Feb 22, 2023
Dataset provided by
QUBES
Authors
Charles Deutch
Description
This project focuses on the use of data analysis problems to introduce students to specific topics in microbiology and to give them practice in the interpretation of figures and tables of data. Each problem is based on a single journal article and includes five to eight multiple-choice questions.
RefSoil Database
figshare.com
application/gzip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jinlyung Choi (2023). RefSoil Database [Dataset]. http://doi.org/10.6084/m9.figshare.4362812.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4362812.v2
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Jinlyung Choi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RefSoil (reference soil) database. Protein-coding nucleotide and amino acid sequences in FASTA format. bacteria.protein.fa.gz : bacteria protein amino acid sequences bacteria.nu.fa.gz : bacteria (CDS) nucleotide sequencesarchaea.protein.fa.gz : archaea protein amino acid sequences archaea.nu.fa.gz : archaea (CDS) nucleotide sequencesrast_genbank.tar.gz: RAST annotated genbank files
MARMICRODB database for taxonomic classification of (marine) metagenomes
zenodo.org
explore.openaire.eu
application/gzip, bin +3
Updated Mar 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shane L Hogle; Shane L Hogle (2020). MARMICRODB database for taxonomic classification of (marine) metagenomes [Dataset]. http://doi.org/10.5281/zenodo.3520509
Explore at:
bin, application/gzip, tsv, html, bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3520509
Dataset updated
Mar 20, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shane L Hogle; Shane L Hogle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction:
This sequence database (MARMICRODB) was introduced in the publication JW Becker, SL Hogle, K Rosendo, and SW Chisholm. 2019. Co-culture and biogeography of Prochlorococcus and SAR11. ISME J. doi:10.1038/s41396-019-0365-4. Please see the original publication and its associated supplementary material for the original description of this resource.

Motivation:
We needed a reference database to annotate shotgun metagenomes from the Tara Oceans project [1] the GEOTRACES cruises GA02, GA03, GA10, and GP13 and the HOT and BATS time series [2]. Our interests are primarily in quantifying and annotating the free-living, oligotrophic bacterial groups Prochlorococcus, Pelagibacterales/SAR11, SAR116, and SAR86 from these samples using the protein classifier tool Kaiju [3]. Kaiju’s sensitivity and classification accuracy depend on the composition of the reference database, and highest sensitivity is achieved when the reference database contains a comprehensive representation of expected taxa from an environment/sample of interest. However, the speed of the algorithm decreases as database size increases. Therefore, we aimed to create a reference database that maximized the representation of sequences from marine bacteria, archaea, and microbial eukaryotes, while minimizing (but not excluding) the sequences from clinical, industrial, and terrestrial host-associated samples.

Results/Description:
MARMICRODB consists of 56 million sequence non-redundant protein sequences from 18769 bacterial/archaeal/eukaryote genome and transcriptome bins and 7492 viral genomes optimized for use with the protein homology classifier Kaiju [3]. To ensure maximum representation of marine bacteria, archaea, and microbial eukaryotes, we included translated genes/transcripts from 5397 representative “specI” species clusters from the proGenomes database [4]; 113 transcriptomes from the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) [5]; 10509 metagenome assembled genomes from the Tara Oceans expedition [6,7], the Red Sea [8], the Baltic Sea [9], and other aquatic and terrestrial sources [10]; 994 isolate genomes from the Genomic Encyclopedia of Bacteria and Archaea [11]; 7492 viral genomes from NCBI RefSeq [12]; 786 bacterial and archaeal genomes from MarRef [13]; and 677 marine single cell genomes [14]. In order to annotate metagenomic reads at the clade/ecotype level (subspecies) for the focal taxa Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116, we generated custom MARMICRODB taxonomies based on curated genome phylogenies for each group. The curated phylogenies, Kaiju formatted Burrows-Wheeler index, translated genes, the custom taxonomy hierarchy, an interactive kronaplot of the taxonomic composition, and scripts and instructions for how to use or rebuild the resource is available from 10.5281/zenodo.3520509.

Methods:
The curation and quality control of MARMICRODB single cell, metagenome assembled, and isolate genomes was performed as described in [15]. Briefly, we downloaded all MARMICRODB genomes as raw nucleotide assemblies from NCBI. We determined an initial genome taxonomy for these assemblies using checkM with the default lineage workflow [16]. All genome bins met the completion/contamination thresholds outlined in prior studies [7,17]. For single cell and metagenome assembled genomes, especially those from Tara Oceans Mediterranean sea samples [18], we use the GTDB-Tk classification workflow [19] to verify the taxonomic fidelity of each genome bin. We then selected genomes with a checkM taxonomic assignment of Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116 for further analysis and confirmed taxonomic assignment using blast matches to known Prochlorococcus/Synechococcus ITS sequences and by matching 16S sequences to the SILVA database [20]. To refine our estimates of completeness/contamination of Prochlorococcus genome bins we created a custom set of 730 single copy protein families (available from 10.5281/zenodo.3719132) from closed, isolate Prochlorococcus genomes [21] for quality assessments with checkM. For Synechococcus we used the CheckM taxonomic-specific workflow with the genus Synechococcus. After the custom CheckM quality control, we excluded any genome bins from downstream analysis that had an estimated quality < 30, defined as %completeness – 5x %contamination resulting in 18769 genome/transcriptome bins. We predicted genes in the resulting genome bins using prodigal [22] and excluded protein sequences with lengths less than 20 and greater than 20000 amino acids, removed non-standard amino acid residues, and condensed redundant protein sequences to a single representative sequence to which we assigned a lowest common ancestor (LCA) taxonomy identifier from the NCBI taxonomy database [23]. The resulting protein sequences were compiled and used to build a Kaiju [3] search database.

The above filtering criteria resulted in 605 Prochlorococcus, 96 Synechococcus, 186 SAR11/Pelagibacterales, 60 SAR86, and 59 SAR116 high-quality genome bins. We constructed a high quality fixed reference phylogenetic tree for each taxonomic group based on genomes manually selected for completeness and the phylogenetic diversity. For example the Prochlorococcus and Synechococcus genomes for the fixed reference phylogeny are estimated > 90% complete, and SAR11 genomes are estimated > 70% complete. We created multiple sequence alignments of phylogenetically conserved genes from these genomes using the GTDB-Tk pipeline [19] with default settings. The pipeline identifies conserved proteins (120 bacterial proteins) and generates concatenated multi-protein alignments [17] from the genome assemblies using hmmalign from the hmmer software suite. We further filtered the resulting alignment columns using the bacterial and archaeal alignment masks from [17] (http://gtdb.ecogenomic.org/downloads). We removed columns represented by fewer than 50% of all taxa and/or columns with no single amino acid residue occuring at a frequency greater than 25%. We trimmed the alignments using trimal [24] with the automated -gappyout option to trim columns based on their gap distribution. We inferred reference phylogenies using multithreaded RAxML [25] with the GAMMA model of rate heterogeneity, empirically determined base frequencies, and the LG substitution model [26](PROTGAMMALGF). Branch support is based on 250 resampled bootstrap trees. This tree was then pruned to only allow a maximum average distance to the closest leaf (ADCL) of 0.003 to reduce the phylogenetic redundancy in the tree [27]. We then “placed” genomes that either did not pass completeness threshold or were considered phylogenetically redundant by ADCL within the fixed reference phylogeny for each group using pplacer [28] representing each placed genome as a pendant edge in the final tree. We then examined the resulting tree and manually selected clade/ecotype cutoffs to be as consistent as possible with clade definitions previously outlined for these groups [29–32]. We then gave clades from each taxonomic group custom taxonomic identifiers and we added these identifiers to the MARMICRODB Kaiju taxonomic hierarchy.

Software/databases used:
checkM v1.0.11[16]
HMMERv3.1b2 (http://hmmer.org/)
prodigal v2.6.3 [22]
trimAl v1.4.rev22 [24]
AliView v1.18.1 [33] [34]
Phyx v0.1 [35]
RAxML v8.2.12 [36]
Pplacer v1.1alpha [28]
GTDB-Tk v0.1.3 [19]
Kaiju v1.6.0 [34]
GTDB RS83 (https://data.ace.uq.edu.au/public/gtdb/data/releases/release83/83.0/)
NCBI Taxonomy (accessed 2018-07-02) [23]
TIGRFAM v14.0 [37]
PFAM v31.0 [38]

Discussion/Caveats:
MARMICRODB is optimized for metagenomic samples from the marine environment, in particular planktonic microbes from the pelagic euphotic zone. We expect this database may also be useful for classifying other types of marine metagenomic samples (for example mesopelagic, bathypelagic, or even benthic or marine host-associated), but it has not been tested as such. The original purpose of this database was to quantify clades/ecotypes of Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116 in metagenomes from Tara Oceans Expedition and the GEOTRACES project. We carefully annotated and quality controlled genomes from these five groups, but the processing of the other marine taxa was largely automated and unsupervised. Taxonomy for other groups was copied over from the Genome Taxonomy Database (GTDB) [19,39] and NCBI Taxonomy [23] so any inconsistencies in those databases will be propagated to MARMICRODB. For most use cases MARMICRODB can probably be used unmodified, but if the user’s goal is to focus on a particular organism/clade that we did not curate in the database then the user may wish to spend some time curating those genomes (ie checking for contamination, dereplicating, building a genome phylogeny for custom taxonomy node assignment). Currently the custom taxonomy is hardcoded in the MARMICRODB.fmi index, but if users wish to modify MARMICRODB by adding or removing genomes, or reconfiguring taxonomic ranks the names.dmp and nodes.dmp files can easily be modified as well as the fasta file of protein sequences. However, the Kaiju index will need to be rebuilt, and user will require a high
International Journal of Systematic and Evolutionary Microbiology (IJSEM)...
figshare.com
txt
Updated Dec 6, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Albert Barberan (2016). International Journal of Systematic and Evolutionary Microbiology (IJSEM) phenotypic database [Dataset]. http://doi.org/10.6084/m9.figshare.4272392.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4272392.v3
Dataset updated
Dec 6, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Albert Barberan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The International Journal of Systematic and Evolutionary Microbiology (IJSEM) is the official publication of the International Committee on Systematics of Prokaryotes and the Bacteriology and Applied Microbiology Division of the International Union of Microbiological Societies, and the official journal of record for novel bacterial and archaeal taxa (http://ijs.microbiologyresearch.org/content/journal/ijsem). We manually searched IJSEM articles to extract phenotypic, metabolic and environmental tolerance data of bacterial strains from 2004 to 2014.

We focused on the most recent entries as they presumably used standardized and state-of-the-art methods, up-to-date taxonomic nomenclature, most strains had easily retrievable 16S rRNA gene sequence data, and many strains also had publicly available genome sequence data available. Data was manually collected using Google Forms as variable structure of the articles and inconsistent reporting of relevant information precluded the use of automatic text parsing algorithms.
n
MiST - Microbial Signal Transduction database
neuinfo.org
scicrunch.org
+1more
Updated Jan 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). MiST - Microbial Signal Transduction database [Dataset]. http://identifiers.org/RRID:SCR_003166
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_003166
Dataset updated
Jan 29, 2022
Description
Database which contains the signal transduction proteins for complete and draft bacterial and archaeal genomes. The MiST2 database identifies and catalogs the repertoire of signal transduction proteins in microbial genomes.
Z
Version 4 (20230306) of the MALDI-ToF Mass Spectrometry Database for...
data.niaid.nih.gov
Updated Dec 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lasch, Peter (2024). Version 4 (20230306) of the MALDI-ToF Mass Spectrometry Database for Identification and Classification of Highly Pathogenic Microorganisms from the Robert Koch-Institute (RKI) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7702374
Explore at:
Dataset updated
Dec 27, 2024
Dataset provided by
Stämmler, Maren
Schneider, Andy
Lasch, Peter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
(Version 20230306)

Version 4 (20230306) of the RKI MALDI-ToF mass spectra database is the third update of the original database (version 20161027, https://doi.org/10.5281/zenodo.163517). The RKI Database v.4 now contains a total of 11055 MALDI-ToF mass spectra from 1599 microbial strains of highly pathogenic (i.e. biosafety level 3, BSL-3) bacteria such as Bacillus anthracis, Brucella melitensis, Yersinia pestis, Burkholderia mallei / pseudomallei and Francisella tularensis as well as a selection of spectra of their close and distant relatives. The database can be used as a reference for the diagnosis of BSL-3 bacteria using proprietary and free software packages for MALDI-ToF MS-based microbial identification. The spectral data are provided as a zip archive (zenodo db 230306.zip) containing the original mass spectra in their native data format (Bruker Daltonics). Please refer to the pdf file (230306-ZENODO-Metadata.pdf) for information on cultivation conditions, sample preparation and details of the spectra acquisition. Please do not try to print this document (>1600 pages!).

Version 20230306 of the RKI database contains for the first time a file in btmsp format (230306_v4_RKI_DB_BSL3.btmsp). This file was generated using the MALDI Biotyper software (Bruker Daltonics) and contains a total of 1599 main spectra from the BSL-3 database in the proprietary data format of the MALDI Biotyper software. *.btmsp files can be imported and used for identification with this software solution. Note that the btmsp file available in database version 4 is broken and cannot be imported. Please refer to updated database versions (4.1, or 4.2) to download valid btmsp files.

The pkf files (230306_ZENODO_30Peaks_0.75.pkf, 230306_ZENODO_45Peaks_0.75.pkf) represent two versions of the MS peak list data in a Matlab compatible format. The latter data can be imported into MicrobeMS, a free Matlab-based software solution developed at the RKI. MicrobeMS can be used for the identification of microorganisms by MALDI-ToF MS and is available at https://wiki-ms.microbe-ms.com.

The RKI mass spectrometry database is updated regularly.

The author would like to thank the following individuals for providing microbial strains and species or mass spectra thereof. Without their help, this work would not have been possible.

Wolfgang Beyer - University of Hohenheim, Faculty of Agricultural Sciences, Stuttgart, Germany

Guido Werner - Robert Koch-Institute, Nosocomial Pathogens and Antibiotic Resistances (FG13), Wernigerode, Germany

Alejandra Bosch - CINDEFI, CONICET-CCT La Plata, Facultad de Ciencias Exactas, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina

Michal Drevinek - National Institute for Nuclear, Biological and Chemical Protection, Milin, Czech Republic

Roland Grunow, Daniela Jacob, Silke Klee, Susann Dupke and Holger Scholz - Robert Koch-Institute, Highly Pathogenic Microorganisms (ZBS2), Berlin, Germany

Jörg Rau - Chemisches und Veterinäruntersuchungsamt Stuttgart, Fellbach, Germany

Jens Jacob - Robert Koch-Institute, Hospital Hygiene, Infection Prevention and Control (FG14), Berlin, Germany

Martin Mielke - Robert Koch-Institute, Department 1 - Infectious Diseases, Berlin, Germany

Monika Ehling-Schulz - Functional Microbiology, Institute of Microbiology, University of Veterinary Medicine, Vienna, Austria

Armand Paauw - Department of Medical Microbiology, CBRN protection, Universitair Medisch Centrum Utrecht, TNO, Rijswijk, The Netherlands

Herbert Tomaso – Friedrich-Löffler-Institut (FLI), Federal Research Institute for Animal Health, Jena, Germany

Gabriel Karner - Karner Düngerproduktion GmbH, Research & Development, Neulengbach, Austria

Rainer Borriss - Institute of Marine Biotechnology e.V. (IMaB), Greifswald, Germany

Le Thi Thanh Tam - Division of Plant Pathology and Phyto-Immunology, Plant Protection Research Institute, Hanoi, Socialist Republic of Vietnam

Xuewen Gao - College of Plant Protection, Nanjing Agricultural University, Key Laboratory of Integrated Management of Crop Diseases and Pests, Nanjing, People’s Republic of China
r
SBDI Sativa curated 16S GTDB database
researchdata.se
figshare.scilifelab.se
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Lundin; Anders Andersson (2025). SBDI Sativa curated 16S GTDB database [Dataset]. http://doi.org/10.17044/SCILIFELAB.14869077
Explore at:
Unique identifier
https://doi.org/10.17044/SCILIFELAB.14869077
Dataset updated
May 7, 2025
Dataset provided by
Linnaeus University
Authors
Daniel Lundin; Anders Andersson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data in this repository is the result of vetting 16S sequences from the Genome Taxonomy Database (GTDB) release R10RS226 (r226) (https://gtdb.ecogenomic.org/; Parks et al. 2018) with the Sativa program (Kozlov et al. 2016) using the sbdi-phylomarkercheck Nextflow pipeline.

Using Sativa [Kozlov et al. 2016], 16S sequences from GTDB were checked so that their phylogenetic signal is consistent with their taxonomy.

Before calling Sativa, sequences longer than 2000 nucleotides or containing Ns were removed, and the reverse complement of each is calculated. Subsequently, sequences were aligned with HMMER [Eddy 2011] using the Barrnap [https://github.com/tseemann/barrnap] archaeal and bacterial 16S profiles respectively, and sequences containing more than 10% gaps were removed. The remaining sequences were analyzed with Sativa, and sequences that were not phylogenetically consistent with their taxonomy were removed.

Files for the DADA2 (Callahan et al. 2016) methods assignTaxonomy and addSpecies are available, in three different versions each. The assignTaxonomy files contain taxonomy for domain, phylum, class, order, family, genus and species. (Note that it has been proposed that species assignment for short 16S sequences require 100% identity (Edgar 2018), so use species assignments from assignTaxonomy with caution.) The versions differ in the maximum number of genomes that we included per species: 1, 5 or 20, indicated by "1genome", "5genomes" and "20genomes" in the file names respectively. Using the version with 20 genomes per species should increase the chances to identify an exactly matching sequence by the addSpecies algorithm, while using a file with many genomes per species could potentially give biases in the taxonomic annotations at higher levels by assignTaxonomy. Our recommendation is hence to use the "1genome" files for assignTaxonomy and "20genomes" for addSpecies.

The fasta files are gzipped fasta files with 16S sequences, the assignTaxonomy associated with taxonomy hierarchies from domain to species whereas the addSpecies file have sequence identities and species names. There is also a fasta files with the original GTDB sequence names: sbdi-gtdb-sativa.r09rs220.20genomes.fna.gz.

Taxonomical annotation of 16S amplicons using this data is available as an optional argument to the nf-core/ampliseq Nextflow workflow: --dada_ref_taxonomy sbdi-gtdb (https://nf-co.re/ampliseq; Straub et al. 2020).

In addition to the fasta files, the workflow outputs phylogenetic trees by optimizing branch-lengths of the original phylogenomic GTDB trees based on a 16S sequence alignment. As not all species in GTDB will have correct 16S sequences, the GTDB trees are first subset to contain only species for which the species representative genome has a correct 16S sequence. Subsequently, branch lengths for the tree are optimized based on the original alignment of 16S sequences using IQTREE [Nguyen et al. 2015] with a GTR+F+I+G4 model. The alignment files end with .alnfna, the taxonomy files with .taxonomy.tsv and the tree files (newick-formatted) end with .brlenopt.newick. They will be made available in nf-core/ampliseq for phylogenetic placement.

The data will be updated circa yearly, after the GTDB database is updated.

Version history

v10 (2025-04-30): Update versions in this text

v9 (2025-04-29): Update to GTDB R10-RS226

v8 (2025-02-18): Remove extra sequences from e.g. "1genome" files that appeared due to ties.

v7 (2024-06-25): Update to GTDB R09-RS220 from R08-RS214.

v6 (2024-04-24): Replace manual procedure with Nextflow pipeline. Update to GTDB R08-RS214 from R07-RS207.

v5 (2022-10-07): Add missing fasta file with original GTDB names.

v4 (2022-08-31): Update to GTDB R07-RS207 from R06-RS202 Acknowledgements

The computations were enabled by resources in project [NAISS 2023/22-601, SNIC 2022/22-500 and SNIC 2021/22-263] provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at UPPMAX, funded by the Swedish Research Council through grant agreement no. 2022-06725.

Computations were also enabled by resources provided by Dr. Maria Vila-Costa, Institute of Environmental Assessment and Water Research (IDAEA-CSIC), Barcelona.
d
MBGD - Microbial Genome Database
dknet.org
scicrunch.org
+2more
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). MBGD - Microbial Genome Database [Dataset]. http://identifiers.org/RRID:SCR_012824/resolver
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_012824 https://identifiers.org/RRID:SCR_012824/resolver
Dataset updated
Jun 23, 2025
Description
MBGD is a database for comparative analysis of completely sequenced microbial genomes, the number of which is now growing rapidly. The aim of MBGD is to facilitate comparative genomics from various points of view such as ortholog identification, paralog clustering, motif analysis and gene order comparison. The heart of MBGD function is to create orthologous or homologous gene cluster table. For this purpose, similarities between all genes are precomputed and stored into the database, in addition to the annotations of genes such as function categories that were assigned by the original authors and motifs that were found in the translated sequence. Using these homology data, MBGD dynamically creates orthologous gene cluster table. Users can change a set of organisms or cutoff parameters to create their own orthologous grouping. Based on this cluster table, users can further analyze multiple genomes from various points of view with the functions such as global map comparison, local map comparison, multiple sequence alignment and phylogenetic tree construction.
Z
Version 2 (20170523) of the MALDI-TOF Mass Spectrometry Database for...
data.niaid.nih.gov
Updated Dec 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lasch, Peter (2024). Version 2 (20170523) of the MALDI-TOF Mass Spectrometry Database for Identification and Classification of Highly Pathogenic Microorganisms from the Robert Koch-Institute (RKI) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_582602
Explore at:
Dataset updated
Dec 27, 2024
Dataset provided by
Stämmler, Maren
Schneider, Andy
Lasch, Peter
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
(Version 20170523)

Edit #1 (Nov 30, 2018): New database version (v.3 - 20181130) - available: 10.5281/zenodo.1880975

Edit #2 (Mar 06, 2023): New database version (v.4.2 - 20230306) - available: 10.5281/zenodo.7702375

Version 2 (20170523) of the RKI’s MALDI-TOF mass spectral database is an update of the original database (version 20161027, https://doi.org/10.5281/zenodo.163517). The RKI database contains mass spectral entries from highly pathogenic (biosafety level 3, BSL-3) bacteria such as Bacillus anthracis, Yersinia pestis, Burkholderia mallei, Burkholderia pseudomallei and Francisella tularensis as well as a selection of spectra from their close and more distant relatives. The database can be used as a reference for the diagnostics of BSL-3 bacteria using proprietary and free software packages for MALDI-TOF MS-based microbial identification. Spectral data are distributed as a 7-zip archive that contains the original mass spectra in its native data format (Bruker Daltonics). Please refer to the pdf file (170523-ZENODO-Metadata.pdf) to obtain information on the metadata of the spectra. Do not try to print this document (~1100 pages!)

The pkf-file (170523_ZENODO_Peaklist_30Peaks_1.6.pkf) contains the MS peak list data in a Matlab compatible format. The latter data file can be imported into MicrobeMS, a Matlab-based free-of-charge software solution developed at RKI. MicrobeMS is available from http://www.microbe-ms.com.

The RKI mass spectral database will be updated on a regular basis.

The author's grateful thanks are given to the following persons for providing microbial strains and species. Without their help this work would not be possible.

Wolfgang Beyer - University of Hohenheim, Faculty of Agricultural Sciences, Stuttgart, Germany

Guido Werner - Robert Koch-Institute, Nosocomial Pathogens and Antibiotic Resistances (FG13), Wernigerode, Germany

Alejandra Bosch - CINDEFI, CONICET-CCT La Plata, Facultad de Ciencias Exactas, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina

Michal Drevinek - National Institute for Nuclear, Biological and Chemical Protection, Milin, Czech Republic

Roland Grunow - Robert Koch-Institute, Highly Pathogenic Microorganisms (ZBS2), Berlin, Germany

Daniela Jacob - Robert Koch-Institute, Highly Pathogenic Microorganisms (ZBS2), Berlin, Germany

Silke Klee - Robert Koch-Institute, Highly Pathogenic Microorganisms (ZBS2), Berlin, Germany

Jörg Rau - Chemisches und Veterinäruntersuchungsamt Stuttgart, Fellbach, Germany

Jens Jacob - Robert Koch-Institute, Hospital Hygiene, Infection Prevention and Control (FG14), Berlin, Germany

Martin Mielke - Robert Koch-Institute, Department 1 - Infectious Diseases, Berlin, Germany

Monika Ehling-Schulz - Functional Microbiology, Institute of Microbiology, University of Veterinary Medicine, Vienna, Austria

Armand Paauw - Department of Medical Microbiology, CBRN protection, Universitair Medisch Centrum Utrecht, TNO, Rijswijk, The Netherlands
CORE Database Statistics.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ann L. Griffen; Clifford J. Beall; Noah D. Firestone; Erin L. Gross; James M. DiFranco; Jori H. Hardman; Bastienne Vriesendorp; Russell A. Faust; Daniel A. Janies; Eugene J. Leys (2023). CORE Database Statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0019051.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0019051.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Ann L. Griffen; Clifford J. Beall; Noah D. Firestone; Erin L. Gross; James M. DiFranco; Jori H. Hardman; Bastienne Vriesendorp; Russell A. Faust; Daniel A. Janies; Eugene J. Leys
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CORE Database Statistics.
m
ShinyFMBN, a Shiny app to access FoodMicrobionet
data.mendeley.com
Updated Jun 11, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eugenio Parente (2020). ShinyFMBN, a Shiny app to access FoodMicrobionet [Dataset]. http://doi.org/10.17632/8fwwjpm79y.4
Explore at:
Unique identifier
https://doi.org/10.17632/8fwwjpm79y.4
Dataset updated
Jun 11, 2020
Authors
Eugenio Parente
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This data set contains the ShinyFMBN app and related material. The ShinyFMBN app allows you to access FoodMicrobionet 3.1, a repository of data on food microbiome studies. To run the app you need to install R and R Studio.

This compressed folder contains: a. folder data: contains a .RDS file of data extracted from FoodMicrobionet, to be used with the FMBNanalyzer script (see below) b. folder FMBNanalyzer: contains the FMBNanalyzer_v_2_1.R which can be used for graphical and statistical analysis of data extracted from FoodMicrobionet c. folder Gephi_network: contains a .gml file extracted from FoodMicrobionet using the ShinyFMBN app, a .gephi file created by importing it, and an example figure of the network d. folder merge_phyloseq_objs: contains a proof of concept script which can be used to merge phyloseq objects extracted from FoodMicrobionet using the ShinFMBN app, together with example data e. folder ShinyFMBN contains the app folder, the runShinyFMBN_2_1_4.R script (a R script to install all needed packages and run the app) and the app manual in .htm format This version includes an improved version of the Shiny app and incorporates changes to the taxa table, which is now aligned to SILVA taxonomy (https://www.arb-silva.de/documentation/silva-taxonomy/). This change has become necessary to improve compatibility with new accessions to FoodMicrobionet, which now assigns taxonomy based on SILVA v138.
d
Laboratory quality-control data associated with samples analyzed for...
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Laboratory quality-control data associated with samples analyzed for microbiological constituents at the USGS Ohio Water Microbiology Laboratory [Dataset]. https://catalog.data.gov/dataset/laboratory-quality-control-data-associated-with-samples-analyzed-for-microbiological-const
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This dataset contains data tables of laboratory quality-control data associated with environmental samples analyzed for microbiological constituents at the Ohio Water Microbiology Laboratory of the U.S. Geological Survey (USGS). The environmental samples were collected across the United States by USGS National Projects and projects in Water Science Centers. These quality-control data can be used to assess the quality of microbiological data for the associated environmental samples.
Data from: Comparison of 432 Pseudomonas strains through integration of...
search.datacite.org
Updated Sep 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J.J. (Jasper Jan) Koehorst (2018). Comparison of 432 Pseudomonas strains through integration of genomic, functional, metabolic and expression data [Dataset]. http://doi.org/10.4121/uuid:948c10fe-7ea5-47f3-bdb2-d5b0f908820b
Explore at:
Unique identifier
https://doi.org/10.4121/uuid:948c10fe-7ea5-47f3-bdb2-d5b0f908820b
Dataset updated
Sep 26, 2018
Dataset provided by
DataCitehttps://www.datacite.org/
4TU.Centre for Research Data
Authors
J.J. (Jasper Jan) Koehorst
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pseudomonas is a highly versatile genus containing species that can be harmful to humans and plants while others are widely used for bioengineering and bioremediation. We analysed 432 sequenced Pseudomonas strains by integrating results from a large scale functional comparison using protein domains with data from six metabolic models, nearly a thousand transcriptome measurements and four large scale transposon mutagenesis experiments. Through heterogeneous data integration we linked gene essentiality, persistence and expression variability. The pan-genome of Pseudomonas is closed indicating a limited role of horizontal gene transfer in the evolutionary history of this genus. A large fraction of essential genes are highly persistent, still non essential genes represent a considerable fraction of the core-genome. Our results emphasize the power of integrating large scale comparative functional genomics with heterogeneous data for exploring bacterial diversity and versatility.
Resilience of Microbial Communities Sequence Data Set
catalog.data.gov
data.amerigeoss.org
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Resilience of Microbial Communities Sequence Data Set [Dataset]. https://catalog.data.gov/dataset/resilience-of-microbial-communities-sequence-data-set
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The EX_Genome_Assemblies.zip file contain the contig sequences (i.e. assembly) of fifteen isolates used for genomic and antibiotic resistance genes (ARG) analysis. EX_OTU.fasta file contain the sequences of the bacterial 16S rRNA-encoding V4 region gene (≈250 nt) for each Operational Taxonomic Unit (OTU). This dataset is associated with the following publication: Gomez-Alvarez, V., S. Pfaller, J. Pressman, D. Wahman, and R. Revetta. Resilience of microbial communities in a simulated drinking water distribution system subjected to disturbances: role of conditionally rare taxa and potential implications for antibiotic-resistant bacteria. Environmental Science: Water Research & Technology. Royal Society of Chemistry, Cambridge, UK, 2: 645-657, (2016).

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). ComBase: A Combined Database For Predictive Microbiology [Dataset]. http://identifiers.org/RRID:SCR_008181

ComBase: A Combined Database For Predictive Microbiology

RRID:SCR_008181, nif-0000-21095, ComBase: A Combined Database For Predictive Microbiology (RRID:SCR_008181), ComBase

Explore at:

8 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://identifiers.org/RRID:SCR_008181

Dataset updated

Jun 17, 2025

Description

A database of information about how microorganisms respond to different environments. The information in ComBase is referred to as quantitative microbiological data since it describes how levels of microorganisms, both spoilage organisms and pathogens, change over the course of time. The primary goal of the ComBase consortium is to improve efficiency in locating specific microbiological information, provide a more rapid means to compare data from different laboratories, and to reduce unnecessary redundancy in conducting microbiological studies. Cornbase was launched in 2003 The ComBase Initiative is a collaboration between the Food Standards Agency and the Institute of Food Research from the United Kingdom; the USDA Agricultural Research Service and its Eastern Regional Research Center from the United States; and the Food Safety Center in Australia. Its purpose is to make data and predictive tools on microbial responses to food environments freely available via web-based software. The ComBase Database (accessible via the ComBase Browser) consists of thousands of microbial growth and survival curves that have been collated in research establishments and from publications. They form the basis for numerous microbial models presented in ComBase Predictor, a useful tool for industry, academia and regulatory agencies. They can be used in developing new food technologies while maintaining food safety; in teaching and research; in assessing the microbial risk in foods or setting up new guidelines.

Clear search

Close search

Google apps

Main menu

ComBase: A Combined Database For Predictive Microbiology

Predictive Microbiology Information Portal (PMIP)

Microbial Protein Interaction Database

ARS Microbial Genomic Sequence Database Server.

Data from: ComBase: A Web Resource for Quantitative and Predictive Food...

NEMiD: A Web-Based Curated Microbial Diversity Database with Geo-Based...

Microbiology Data Problems 2023

RefSoil Database

MARMICRODB database for taxonomic classification of (marine) metagenomes

International Journal of Systematic and Evolutionary Microbiology (IJSEM)...

MiST - Microbial Signal Transduction database

Version 4 (20230306) of the MALDI-ToF Mass Spectrometry Database for...

SBDI Sativa curated 16S GTDB database

MBGD - Microbial Genome Database

Version 2 (20170523) of the MALDI-TOF Mass Spectrometry Database for...

CORE Database Statistics.

ShinyFMBN, a Shiny app to access FoodMicrobionet

Laboratory quality-control data associated with samples analyzed for...

Data from: Comparison of 432 Pseudomonas strains through integration of...

Resilience of Microbial Communities Sequence Data Set

ComBase: A Combined Database For Predictive Microbiology

RRID:SCR_008181, nif-0000-21095, ComBase: A Combined Database For Predictive Microbiology (RRID:SCR_008181), ComBase