100+ datasets found

MicrobiomeHD: the human gut microbiome in health and disease
zenodo.org
search.datacite.org
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm; Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm (2020). MicrobiomeHD: the human gut microbiome in health and disease [Dataset]. http://doi.org/10.5281/zenodo.569601
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.569601
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm; Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Overview

MicrobiomeHD is a standardized database of human gut microbiome studies in health and disease. This database includes publicly available 16S data from published case-control studies and their associated patient metadata. Raw sequencing data for each study was downloaded and processed through a standardized pipeline.

To be included in MicrobiomeHD, datasets have:

publicly available raw sequencing data (fastq or fasta)

publicly available metadata with at least case and control labels for each patient

at least 15 case patients

Currently, MicrobiomeHD is focused on stool samples. Additional samples may be included in certain datasets, as indicated in the metadata.

Files

Additional information about the datasets included in this MicrobiomeHD release are in the MicrobiomeHD github repo https://github.com/cduvallet/microbiomeHD, in the file db/dataset_info.yaml. Top-level identifiers correspond to the dataset IDs used in Duvallet et al. 2017. Sample sizes in the yaml file are those that were described in the papers, and may not exactly reflect the actual data (due to missing/extra data, samples which didn't pass quality control, etc).

Each dataset was downloaded and processed through a standardized pipeline. The raw processing results are available in the *.tar.gz files here. Each file has the same directory structure and files, as described in the pipeline documentation: http://amplicon-sequencing-pipeline.readthedocs.io/en/latest/output.html.

Specific files of interest include:

summary_file.txt: this file contains a summary of all parameters used to process the data

datasetID.metadata.txt: the metadata associated with the samples. Note that some samples in the metadata may not have sequencing data, and vice versa.

RDP/datasetID.otu_table.100.denovo.rdp_assigned: the 100% OTU tables with Latin taxonomic names assigned using the RDP classifier.

datasetID.otu_seqs.100.fasta: representative sequences for each OTU in the 100% OTU table. OTU labels in the OTU table end with d_denovoID - these denovoIDs correspond to the sequences in this file. Processing

The raw data was acquired as described in the supplementary materials of Duvallet et al.'s "Meta analysis of microbiome studies identifies shared and disease-specific patterns".

Raw sequencing data was processed with the Alm lab's in-house 16S processing pipeline: https://github.com/thomasgurry/amplicon_sequencing_pipeline

Pipeline documentation is available at: http://amplicon-sequencing-pipeline.readthedocs.io/

Metadata was extracted from the original papers and/or data sources, and formatted manually.

Contributing

MicrobiomeHD is a resource that can be used to extract disease-specific microbiome signals in individual case-control studies. Many microbes respond non-specifically to health and disease, and the majority of bacterial associations within individual studies overlap with this "core" response. Researchers should cross-check their results with the data presented here to ensure that their identified microbial associations are specific to their disease under study.

We provide an updated list of "core" microbes here, as well as the raw OTU tables for anyone who wishes to reproduce and adapt this analysis to their study question.

If you would like to include your case-control dataset in MicrobiomeHD, please email duvallet[at]mit.edu.

For us to process your data through our standard pipeline, you will need to provide the following files and information about your data:

raw sequencing data in fastq or fasta format (preferably fastq)

information about which processing steps will be required (e.g. removing primers or barcodes, merging paired-end reads, etc)

sample IDs associated with the sequencing data (either mapped to barcodes still in the sequences, or to each de-multiplexed sequencing file)

case/control metadata of each sample

other relevant metadata (e.g. sampling site, if not all samples are stool; sampling time point, if multiple samples per patient were taken; etc)

By using MicrobiomeHD in your own analyses, you agree to contribute your dataset to this database and to make your raw sequencing data (i.e. fastq files) publicly available.

Citing MicrobiomeHD

The MicrobiomeHD database and original publications for each of these datasets are described in Duvallet et al. (2017): http://biorxiv.org/content/early/2017/05/08/134031

If you use any of these datasets in your analysis, please cite both MicrobiomeHD (Duvallet et al. (2017)) and the original publication for each dataset that you use.

The code used to process and analyze this data in Duvallet et al. (2017) is available on github: https://github.com/cduvallet/microbiomeHD

Files

Core genera

file-S3.core_genera.txt: Supplemental Table 3 from Duvallet et al. (2017), listing the core health- and disease-associated microbes.

Datasets

Note that MicrobiomeHD contains all 28 datasets from Duvallet et al. (2017), as well as additional datasets which did not meet the inclusion criteria for the meta-analysis presented in the paper. Additional information about the datasets included in this MicrobiomeHD release are in the original publications and the MicrobiomeHD github repo https://github.com/cduvallet/microbiomeHD, in the file db/dataset_info.yaml.

The sample sizes listed here reflect what was reported in the original publications. Some may have discrepancies between what is reported and what is in the actual data due to missing data, quality issues, barcode mismatches, etc.

asd_son_results.tar.gz (asd_son): NT: 44, ASD: 59

http://dx.doi.org/10.1371/journal.pone.0137725

</li> <li>autism_kb_results.tar.gz (asd_kang): H: 20, ASD: 20 <ul> <li>http://dx.doi.org/10.1371/journal.pone.0068322</li> </ul> </li> <li>cdi_schubert_results.tar.gz (noncdi_schubert): H: 155, nonCDI: 89, CDI: 94 <ul> <li>http://dx.doi.org/10.1128/mBio.01021-14</li> </ul> </li> <li>cdi_vincent_v3v5_results.tar.gz (cdi_vincent): H: 25, CDI: 25 <ul> <li>http://dx.doi.org/10.1186/2049-2618-1-18</li> </ul> </li> <li>cdi_youngster_results.tar.gz (cdi_youngster): H: 4, CDI: 19 <ul> <li>http://dx.doi.org/10.1093/cid/ciu135</li> </ul> </li> <li>crc_baxter_results.tar.gz (crc_baxter): adenoma: 198, H: 172, CRC: 120 <ul> <li>http://dx.doi.org/10.1186/s13073-016-0290-3</li> </ul> </li> <li>crc_xiang_results.tar.gz (crc_chen): H: 22, CRC: 21 <ul> <li>http://dx.doi.org/10.1371/journal.pone.0039743</li> </ul> </li> <li>crc_zackular_results.tar.gz (crc_zackular): adenoma: 30, H: 30, CRC: 30 <ul> <li>http://dx.doi.org/10.1158/1940-6207.CAPR-14-0129</li> </ul> </li> <li>crc_zeller_results.tar.gz (crc_zeller): H: 75, CRC: 41 <ul> <li>http://dx.doi.org/10.15252/msb.20145645</li> </ul> </li> <li>crc_zhao_results.tar.gz (crc_wang): H: 56, CRC: 46 <ul> <li>http://dx.doi.org/10.1038/ismej.2011.109}</li> </ul> </li> <li>edd_singh_results.tar.gz (edd_singh): STEC: 28, CAMP: 71, SALM: 66, SHIG: 34, H: 75 <ul> <li>http://dx.doi.org/10.1186/s40168-015-0109-2</li> </ul> </li> <li>hiv_dinh_results.tar.gz (hiv_dinh): H: 16, HIV: 21 <ul> <li>http://dx.doi.org/10.1093/infdis/jiu409</li> </ul> </li> <li>hiv_lozupone_results.tar.gz (hiv_lozupone): H: 13, HIV: 25 <ul> <li>http://dx.doi.org/10.1016/j.chom.2013.08.006</li> </ul> </li> <li>hiv_noguerajulian_results.tar.gz (hiv_noguerajulian): H: 34, HIV: 206 <ul> <li>https://doi.org/10.1016%2Fj.ebiom.2016.01.032</li> </ul> </li> <li>ibd_alm_results.tar.gz (ibd_papa): IBDundef: 1, nonIBD: 24, UC: 43, CD: 23 <ul> <li>http://dx.doi.org/10.1371/journal.pone.0039242</li> </ul> </li> <li>ibd_engstrand_maxee_results.tar.gz (ibd_willing): CCD: 12, H: 35, ICD: 15, UC: 16, ICCD: 2 <ul> <li>http://dx.doi.org/10.1053/j.gastro.2010.08.049</li> </ul> </li> <li>ibd_gevers_2014_results.tar.gz (ibd_gevers): H: 31, CD: 224 <ul> <li>http://dx.doi.org/10.1016/j.chom.2014.02.005</li> </ul> </li> <li>ibd_huttenhower_results.tar.gz (ibd_morgan): H: 18, UC: 48, CD: 62 <ul> <li>http://dx.doi.org/10.1186/gb-2012-13-9-r79</li> </ul> </li> <li>mhe_zhang_results.tar.gz (liv_zhang): CIRR: 25, H: 26, MHE: 26 <ul> <li>http://dx.doi.org/10.1038/ajg.2013.221</li> </ul> </li> <li>nash_chan_results.tar.gz (nash_wong): H: 22, NASH: 16 <ul> <li>http://dx.doi.org/10.1371/journal.pone.0062885</li> </ul> </li> <li>nash_ob_baker_results.tar.gz (nash_zhu): H: 16, NASH: 22, OB: 25 <ul> <li>http://dx.doi.org/10.1002/hep.26093</li> </ul> </li> <li>ob_goodrich_results.tar.gz (ob_goodrich): OW: 322, H: 433, OB: 183 <ul> <li>http://dx.doi.org/10.1016/j.cell.2014.09.053</li> </ul> </li> <li>ob_gordon_2008_v2_results.tar.gz (ob_turnbaugh): H: 61, OB:
n
HOMD
neuinfo.org
dknet.org
+2more
Updated Jan 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). HOMD [Dataset]. http://identifiers.org/RRID:SCR_012770
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_012770
Dataset updated
Jan 29, 2022
Description
THIS RESOURCE IS NO LONGER IN SERVICE.Documented on April 14,2022. Database of comprehensive information on the approximately 600 prokaryote species that are present in the human oral cavity. The majority of these species are uncultivated and unnamed, recognized primarily by their 16S rRNA sequences. The HOMD presents a provisional naming scheme for the currently unnamed species so that strain, clone, and probe data from any laboratory can be directly linked to a stably named reference entity. The HOMD links sequence data with phenotypic, phylogenetic, clinical, and bibliographic information. Full and partial oral bacterial genome sequences determined as part of this project and the Human Microbiome Project, are being added to the HOMD as they become available. HOMD offers easy to use tools for viewing all publicly available oral bacterial genomes. Data is also downloadable.
Human Microbiome Compendium dataset
zenodo.org
application/gzip, tsv
Updated Jan 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard J. Abdill; Richard J. Abdill; Samantha P. Graham; Samantha P. Graham; Vincent Rubinetti; Vincent Rubinetti; Frank W. Albert; Frank W. Albert; Casey S. Greene; Casey S. Greene; Sean Davis; Sean Davis; Ran Blekhman; Ran Blekhman (2024). Human Microbiome Compendium dataset [Dataset]. http://doi.org/10.5281/zenodo.10452633
Explore at:
application/gzip, tsvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10452633
Dataset updated
Jan 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Richard J. Abdill; Richard J. Abdill; Samantha P. Graham; Samantha P. Graham; Vincent Rubinetti; Vincent Rubinetti; Frank W. Albert; Frank W. Albert; Casey S. Greene; Casey S. Greene; Sean Davis; Sean Davis; Ran Blekhman; Ran Blekhman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Human Microbiome Compendium is an ongoing project to build a large collection of human microbiome sequencing data processed with a uniform pipeline. Currently, the compendium contains 16S rRNA amplicon sequencing data for human gut microbiome samples retrieved from the Sequence Read Archive. Our website at microbiomap.org has more information about the project and links to related resources. This data is freely available under a CC-BY license; if you use it in your work, please cite our preprint, "Integration of 168,000 samples reveals global patterns of the human gut microbiome" (doi: 10.1101/2023.10.11.560955).

If you are using this dataset in conjunction with your own results, it's important to note that starting in version 1.0.1, the nomenclature used in this taxonomic table diverges from the output generated by DADA2 and the SILVA database. See the v1.0.1 release notes directly below for details.

Version history

1.0.1: The "asv_assignments" table was corrected to fix entries in which the taxonomic levels were incorrectly inferred from the reference database by DADA2 (e.g. genus "Brassicibacter" was listed as a family, genus "Gelria" was listed as an order). The problem is documented in issues attached to repositories for DADA2, DADA2 reference databases, and our MicroBioMap library. In short, problems were noted in v138 of the SILVA database in which taxonomic names were not recorded properly if they were missing levels (e.g. a taxon has been assigned a proposed genus, but not a family). This was addressed in v138.1, which we originally used for generating this dataset. However, several dozen entries remain incorrectly annotated in v138.1—our 1.0.1 release corrects these by filling in the nomenclature gaps with "(unclassified)" and moving the existing data to the correct level. 2881 ASV assignments were affected out of about 4.3 million. The new file "taxa_corrections.tsv" is a copy of the "bad-taxa.csv" list generated by Michael McLaren, with notes added to reflect what we changed.

1.0.0: Added README.md file to the repository, and added a link to the preprint and title/author metadata for the Zenodo entry

0.2.1: "sample_metadata.tsv" was missing (Note: This was accidentally tagged "0.2.0" in the version history.)

0.2.0: Replacing "country" column in sample_metadata.tsv with an "iso" column using the country code rather than name.

0.1.0: Prepping for public release
o
The Human Microbiome Project
registry.opendata.aws
kaggle.com
Updated Apr 20, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The National Institutes of Health Office of Strategic Coordination - The Common Fund (2018). The Human Microbiome Project [Dataset]. https://registry.opendata.aws/human-microbiome-project/
Explore at:
Dataset updated
Apr 20, 2018
Dataset provided by
<a href="https://commonfund.nih.gov/hmp">The National Institutes of Health Office of Strategic Coordination - The Common Fund</a>
Description
The NIH-funded Human Microbiome Project (HMP) is a collaborative effort of over 300 scientists from more than 80 organizations to comprehensively characterize the microbial communities inhabiting the human body and elucidate their role in human health and disease. To accomplish this task, microbial community samples were isolated from a cohort of 300 healthy adult human subjects at 18 specific sites within five regions of the body (oral cavity, airways, urogenital track, skin, and gut). Targeted sequencing of the 16S bacterial marker gene and/or whole metagenome shotgun sequencing was performed for thousands of these samples. In addition, whole genome sequences were generated for isolate strains collected from human body sites to act as reference organisms for analysis. Finally, 16S marker and whole metagenome sequencing was also done on additional samples from people suffering from several disease conditions.
Pbac v1 database - Panda Gut Microbiome Database
figshare.com
application/x-gzip
Updated Jul 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FEILONG DENG (2025). Pbac v1 database - Panda Gut Microbiome Database [Dataset]. http://doi.org/10.6084/m9.figshare.29599118.v2
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29599118.v2
Dataset updated
Jul 18, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
FEILONG DENG
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pbac (Panda Gut Microbiome Database): A Curated Resource for Giant Panda Gut Microbial Genomes. In this project, we conducted an in-depth analysis of giant panda metagenome-assembled genomes (MAGs), utilizing both Illumina and Nanopore sequencing technologies. Our extensive efforts resulted in the identification of 2,684 medium- to high-quality MAGs meeting specific criteria: completeness ≥ 50%, contamination < 10%, and length ≥ 500 kb. Remarkably, 960 MAGs surpassed the stringent high-quality thresholds of completeness ≥ 90% and contamination < 5%. Within this dataset, we identified 1,193 non-redundant MAGs through a 99% similarity threshold clustering, with 354 of them being of high quality. Taxonomic analysis revealed that 672 MAGs could be mapped to 219 known species, while 521 MAGs clustered into 228 unique groups, leading to the assignment of new genus or species identifiers.
b
Human Oral Microbiome Database
bioregistry.io
Updated Apr 29, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Human Oral Microbiome Database [Dataset]. https://bioregistry.io/homd.taxon
Explore at:
Dataset updated
Apr 29, 2021
Description
The Human Oral Microbiome Database (HOMD) provides a site-specific comprehensive database for the more than 600 prokaryote species that are present in the human oral cavity. It contains genomic information based on a curated 16S rRNA gene-based provisional naming scheme, and taxonomic information. This datatype contains taxonomic information.
Microplastics Fish Gut Microbiome Data For EDA/ML
kaggle.com
zip
Updated Jul 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ISMAILDRISSI25 (2025). Microplastics Fish Gut Microbiome Data For EDA/ML [Dataset]. https://www.kaggle.com/datasets/ismaildrissi25/microplastics-fish-gut-microbiome-data-for-ml
Explore at:
zip(252677 bytes)Available download formats
Dataset updated
Jul 19, 2025
Authors
ISMAILDRISSI25
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
This dataset was compiled for a Master's thesis project focused on investigating the gut microbiota response in fish exposed to microplastics. It contains cleaned and annotated metadata along with taxonomic abundance information and exposure features, prepared for predictive machine learning modeling.

Context Microplastics (MPs) are emerging pollutants in aquatic ecosystems. Numerous studies have shown that MPs can impact the gut microbial composition of fish. This dataset integrates data from multiple studies through a meta-analysis approach, standardized using bioinformatics and machine learning pipelines.

Source Sequences and metadata were extracted from public BioProject entries in the NCBI SRA database.

Data processing: QIIME2, Python (pandas, scikit-learn), Google Colab

Total size: ~648 FASTQ files → summarized into machine learning-ready tabular format

Applications Microbiome classification modeling

Environmental ecotoxicology analysis

Meta-analysis benchmarking

Feature importance and interpretability (SHAP, feature selection)
Human Fecal Gut Microbiome proteomics
data.niaid.nih.gov
ebi.ac.uk
xml
Updated Aug 3, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Greg Stupp; Dennis Wolan (2016). Human Fecal Gut Microbiome proteomics [Dataset]. https://data.niaid.nih.gov/resources?id=pxd003907
Explore at:
xmlAvailable download formats
Dataset updated
Aug 3, 2016
Dataset provided by
Assistant Professor Department of Molecular and Experimental Medicine TSRI - California Campus USA
TSRI
Authors
Greg Stupp; Dennis Wolan
Variables measured
Proteomics
Description
5 human fecal gut samples, collected and prepared for standard MudPIT data collection from healthy volunteers, searched with the ComPIL database. x3 replicates each
Microbial Community Database (MiCoDa). A curated global 16S rRNA gene...
gbif.org
Updated Oct 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephanie Jurburg; Clara Arboleda-Baena; Anahita Kazem; Tobias Frøslev; Thomas Jeppesen; Stephanie Jurburg; Clara Arboleda-Baena; Anahita Kazem; Tobias Frøslev; Thomas Jeppesen (2025). Microbial Community Database (MiCoDa). A curated global 16S rRNA gene amplicon dataset from all environments [Dataset]. http://doi.org/10.15468/ver9ne
Explore at:
Unique identifier
https://doi.org/10.15468/ver9ne
Dataset updated
Oct 23, 2025
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig
Authors
Stephanie Jurburg; Clara Arboleda-Baena; Anahita Kazem; Tobias Frøslev; Thomas Jeppesen; Stephanie Jurburg; Clara Arboleda-Baena; Anahita Kazem; Tobias Frøslev; Thomas Jeppesen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2001 - Dec 31, 2023
Description
MiCoDa is a searchable database that hosts over 30,000 samples of processed 16S rRNA gene amplicon sequences from aquatic, host-associated, and mineral environments, spanning the entire globe. To improve cross-study comparability, all samples in MiCoDa have been sequenced in the same region of the 16S rRNA gene (between base pairs 515 and 806). MiCoDa also hosts the Earth Microbiome Project samples, processed in the same manner. MiCoDa is currently the largest public, human-curated microbiome database available. Its goal is to encourage the reuse of extant sequence data by specialists and non-specialists alike. To this end, we have manually curated the data and metadata included, preprocessed the sequence data to maximize comparability, and created a searchable data portal. MiCoDa is led by Dr. Stephanie Jurburg (microbial ecology), and hosted and supported by the Integrative Biodiversity Data and Code Unit of the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, the Microbial Interaction Ecology group of the Helmholtz Centre for Environmental Research- Leipzig and the FUSION group of Friedrich Schiller Universität- Jena. For more information about MiCoDA and the Data Collection, visit https://micoda.idiv.de/v1/dataCollection
[This dataset was processed using the GBIF Metabarcoding Data Toolkit.]
Data from: Methanobacteriaceae
figshare.com
bin
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruaud (2023). Methanobacteriaceae [Dataset]. http://doi.org/10.6084/m9.figshare.20145425.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20145425.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Ruaud
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data for the Methanobacetriaceae analysis.

Metagenomes and associated sample metadata from a globally distributed set of studies were gathered from the curatedMetagenomicData (Pasolli et al. 2017) by Youngblut et al. (2020). Briefly, (i) metagenome were profiled with the HUMAnN2 pipeline to obtain metabolic pathways profiles based on the MetaCyc database and with Kraken2 and Bracken v2.2 based on a customized Genome Taxonomy Database (GTDB), Release 89.0 created with Struo v0.1.6 (available athttp://ftp.tue.mpg.de/ebio/projects/struo/) for the taxonomic profiles; (ii) rare taxa were filtered out and taxonomic ranks from family to species were included (n = 2190 taxa; 181 families, 562 genera and 1447 species); (iii) relative abundances of MetaCyc metabolic pathways at the community level with complete coverage and a prevalence greater than 25 % were included (n = 117 pathways).

Metadata: information on samples

Taxa relative abundances - no methanogens: relative abundances of at the family, genus, and species levels; all Methanobacteriaceae were removed.

MetaCyc pathways - no methanogens: relative abundances at the community level of MetaCyc pathways; all pathways to which Methanobacteriaceae participate were removed.

Methanogens-targets: predicted target = presence/absence of Methanobacteriaceae (Mtbc). The Msmi column corresponds to the presence/absence of the species Methanobrevibacter smithii.
g
Supporting data for the dynamics and stabilization of the Human gut...
gigadb.org
aspera.gigadb.org
Updated May 14, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). Supporting data for the dynamics and stabilization of the Human gut microbiome during the first year of life. [Dataset]. http://doi.org/10.5524/100145
Explore at:
Unique identifier
https://doi.org/10.5524/100145
Dataset updated
May 14, 2015
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Here we performed metagenomic shotgun sequencing on fecal samples from 98 full-term Swedish infants (new born, 4-months and 12-months old) and their mothers; assembled gut microbial genomes and constructed reference gene catalogs from the cohort. We generated 1.52 Tb paired-end reads of high-quality sequences (average 3.99 Gb per sample). A gene catalog was constructed for each time point based on de novo assembly and metagenomic gene prediction; and functionally annotated using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. We also assembled a total of 4,356 microbial genomes (>0.9 MB) de novo; by binning assembled contigs according to abundance variations across samples. These de novo assembled genomes were complemented by 1,147 genomes from the National Center for Biotechnology Information (NCBI) Bacteria/Archaea genome database. All genomes were subsequently clustered into 690 unique metagenomic operational taxonomic units (MetaOTUs) that were equivalent to species-level classifications. Of these, 373 were annotated to species, the remaining 317 represent novel species related to known species. We constructed the metaOTUs profile by mapping reads to our metaOTUs sequences.
n
MiMeDB
neuinfo.org
dknet.org
+2more
Updated Mar 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). MiMeDB [Dataset]. http://identifiers.org/RRID:SCR_025108
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_025108
Dataset updated
Mar 18, 2024
Description
Database containing detailed information about small molecules produced by human microbiome. Provides metabolite data including structure, names, descriptions, chemical taxonomy, chemical ontology, physico-chemical data, spectra and contains detailed information about microbes that produce these chemicals, enzymatic reactions responsible for their production, bioactivity of chemicals and anatomical location of these chemicals and microbes. Many data fields in the database are hyperlinked to other databases including FooDB, HMDB, KEGG, PubChem, MetaCyc, ChEBI, UniProt, and GenBank. Database is FAIR compliant.The data in MiMeDB are released under the Creative Commons (CC) 4.0 License.
Data from: Microbial Observatory (ISS-MO): Indoor microbiome study of the...
data.nasa.gov
gimi9.com
+2more
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Microbial Observatory (ISS-MO): Indoor microbiome study of the International Space Station surfaces [Dataset]. https://data.nasa.gov/dataset/microbial-observatory-iss-mo-indoor-microbiome-study-of-the-international-space-station-su-c261f
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Presented here is the environmental microbiome study of the International Space Station surfaces. The environmental samples were collected with the polyester wipes from eight different locations in the ISS during two consecutive sampling sessions (three months apart). The specific objective was to unveil the pool of genes for each location during two separate sessions to learn of functional and metabolic diversity of microorganisms in the ISS. The International Space Station (ISS) as a closed built environment has its own environmental microbiome which is shaped by microgravity, radiation, and limited human presence. The microbial diversity associated with ISS environmental surfaces was investigated during this study. Polyester wipes and contact slides were used for sampling of eight various surface locations on the ISS at different time periods. The samples were retrieved and analyzed immediately upon the return to the Earth (via Soyuz TMA-14M or Dragon capsule from SpaceX). After surface sample collection, contact slides containing nutrient media for the growth of bacteria and fungi were incubated at 25C. The polyester wipes were processed to measure microbial burden (R2A, Blood Agar, and Potato Dextrose Agar) and recover cultivable bacteria as well as fungi. Subsequently, viable microbial burden was assessed using Adenosine Triphosphate (ATP) assay, and quantitative polymerase chain reaction (PCR) methods after propidium monoazide (PMA) treatment. The 16S-tag and metagenome analyses were used to elucidate viable microbial diversity. The cultivable bacterial population yield from the polyester wipes was very high (5 to 7-logs) when compared with the contact slides (10^2 to 10^3 CFU/m2). The PMA-qPCR analysis showed considerable variation of viable bacterial population (10^5 to 10^9 16S rDNA gene copies/m2) among locations sampled. Unlike contact slides, polyester wipes cover much larger sample surface (~1 m2) and produce much more reliable results of the microbial diversity of the ISS covering both cultivable and non-cultivable species. The cultivable, total, and viable microbial diversity was determined utilizing state-of-the art molecular techniques. The implementation of the PMA assay before DNA extraction allowed distinguishing viable microorganisms, which is crucial for determining their role to the crew health, the ISS maintenance and the general knowledge of the closed environmentally controlled built systems.
h
gut-microbiome-allergy-data
huggingface.co
Updated Jan 7, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Science (2026). gut-microbiome-allergy-data [Dataset]. https://huggingface.co/datasets/hugging-science/gut-microbiome-allergy-data
Explore at:
Dataset updated
Jan 7, 2026
Dataset authored and provided by
Hugging Science
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for Gut Microbiome–Food Allergy Prediction Datasets

Dataset Summary

This repository contains multiple human gut microbiome datasets curated for predicting food allergy development. Each dataset corresponds to a distinct cohort with longitudinal microbiome sampling, providing both metadata and derived embeddings suitable for machine learning. The datasets are designed to support binary classification of subjects into healthy vs allergic categories, enabling… See the full description on the dataset page: https://huggingface.co/datasets/hugging-science/gut-microbiome-allergy-data.
n
Data and code from: Learning a deep language model for microbiomes: The...
data.niaid.nih.gov
search.dataone.org
+3more
zip
Updated Feb 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quintin Pope; Rohan Varma; Christine Tataru; Maude David; Xiaoli Fern (2025). Data and code from: Learning a deep language model for microbiomes: The power of large scale unlabeled microbiome data [Dataset]. http://doi.org/10.5061/dryad.tb2rbp08p
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.tb2rbp08p
Dataset updated
Feb 20, 2025
Dataset provided by
University of Michigan
Oregon State University
Authors
Quintin Pope; Rohan Varma; Christine Tataru; Maude David; Xiaoli Fern
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
We use open source human gut microbiome data to learn a microbial “language” model by adapting techniques from Natural Language Processing (NLP). Our microbial “language” model is trained in a self-supervised fashion (i.e., without additional external labels) to capture the interactions among different microbial taxa and the common compositional patterns in microbial communities. The learned model produces contextualized taxon representations that allow a single microbial taxon to be represented differently according to the specific microbial environment in which it appears. The model further provides a sample representation by collectively interpreting different microbial taxa in the sample and their interactions as a whole. We demonstrate that, while our sample representation performs comparably to baseline models in in-domain prediction tasks such as predicting Irritable Bowel Disease (IBD) and diet patterns, it significantly outperforms them when generalizing to test data from independent studies, even in the presence of substantial distribution shifts. Through a variety of analyses, we further show that the pre-trained, context-sensitive embedding captures meaningful biological information, including taxonomic relationships, correlations with biological pathways, and relevance to IBD expression, despite the model never being explicitly exposed to such signals. Methods No additional raw data was collected for this project. All inputs are available publicly. American Gut Project, Halfvarson, and Schirmer raw data are available from the NCBI database (accession numbers PRJEB11419, PRJEB18471, and PRJNA398089, respectively). We used the curated data produced by Tataru and David, 2020.
m
Human Skin Microbiome Data (16S rRNA sequencing)
data.mendeley.com
search.datacite.org
Updated Oct 15, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marisa Nielsen (2020). Human Skin Microbiome Data (16S rRNA sequencing) [Dataset]. http://doi.org/10.17632/th7bfgfc6m.1
Explore at:
Unique identifier
https://doi.org/10.17632/th7bfgfc6m.1
Dataset updated
Oct 15, 2020
Authors
Marisa Nielsen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
16S rRNA sequencing data on human skin microbiome samples collected before and after swimming in the ocean. This dataset contains raw sequencing data contained in fasta and qual files produced from an Ion Torrent PGM sequencer. There were 2 sampling occurrences (041218 and 092718) and each occurrence has an associated fasta and qual file. This dataset contains the 041218 sampling data only due to storage restrictions. The other dataset is published separately. Our research has shown that the human skin microbiome is altered after swimming in the ocean. Normal commensals were washed off and simultaneously, exogenous bacteria were deposited on the skin. QIIME was used for initial analysis and indicated that the abundance and diversity of microbial communities on the skin increased after swimming and these changes persisted for more than 24 hours. Downstream analysis using PICRUSt to predict functional metagenomics indicated that there was an increase in antibiotic resistance genes, antibiotic biosynthesis genes, and virulence factor genes on the skin after ocean water exposure.
dbBact v2022.07.01 database dump
zenodo.org
bin
Updated May 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amnon Amir; Amnon Amir (2023). dbBact v2022.07.01 database dump [Dataset]. http://doi.org/10.5281/zenodo.7961961
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7961961
Dataset updated
May 25, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Amnon Amir; Amnon Amir
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A full database dump (postgres) for dbBact, the microbiome database (see website).

This is version 2022.07.01, used in the paper examples.

Latest version can be downloaded from:

https://dbbact.org/download
f
Data from: Structural Insights into Endobiotic Reactivation by Human Gut...
datasetcatalog.nlm.nih.gov
Updated Sep 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lim, Lauren; Gibbs, Morgan E.; Redinbo, Matthew R.; Creekmore, Benjamin C.; Simpson, Joshua B.; Walton, William G.; Ervin, Samantha M.; Gharaibeh, Raad Z. (2020). Structural Insights into Endobiotic Reactivation by Human Gut Microbiome-Encoded Sulfatases [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000564343
Explore at:
Dataset updated
Sep 30, 2020
Authors
Lim, Lauren; Gibbs, Morgan E.; Redinbo, Matthew R.; Creekmore, Benjamin C.; Simpson, Joshua B.; Walton, William G.; Ervin, Samantha M.; Gharaibeh, Raad Z.
Description
Phase II drug metabolism inactivates xenobiotics and endobiotics through the addition of either a glucuronic acid or sulfate moiety prior to excretion, often via the gastrointestinal tract. While the human gut microbial β-glucuronidase enzymes that reactivate glucuronide conjugates in the intestines are becoming well characterized and even controlled by targeted inhibitors, the sulfatases encoded by the human gut microbiome have not been comprehensively examined. Gut microbial sulfatases are poised to reactivate xenobiotics and endobiotics, which are then capable of undergoing enterohepatic recirculation or exerting local effects on the gut epithelium. Here, using protein structure-guided methods, we identify 728 distinct microbiome-encoded sulfatase proteins from the 4.8 million unique proteins present in the Human Microbiome Project Stool Sample database and 1766 gut microbial sulfatases from the 9.9 million sequences in the Integrated Gene Catalogue. We purify a representative set of these sulfatases, elucidate crystal structures, and pinpoint unique structural motifs essential to endobiotic sulfate processing. Gut microbial sulfatases differentially process sulfated forms of the neurotransmitters serotonin and dopamine, and the hormones melatonin, estrone, dehydroepiandrosterone, and thyroxine in a manner dependent both on variabilities in active site architecture and on markedly distinct oligomeric states. Taken together, these data provide initial insights into the structural and functional diversity of gut microbial sulfatases, providing a path toward defining the roles these enzymes play in health and disease.
Drinking Water Microbiome Taxonomic Lineage Abundance Data Set
catalog.data.gov
gimi9.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Drinking Water Microbiome Taxonomic Lineage Abundance Data Set [Dataset]. https://catalog.data.gov/dataset/drinking-water-microbiome-taxonomic-lineage-abundance-data-set
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
An abundance matrix (BM_taxonomic_lineage.xlsx) contains rows as taxonomic lineage, columns as samples, and entries representing the abundance of each lineage as a ratio of all sequences obtained for each individual sample. This dataset is associated with the following publication: Gomez-Alvarez, V., and R. Revetta. Monitoring of Nitrification in Chloraminated Drinking Water Distribution Systems With Microbiome Bioindicators Using Supervised Machine Learning. Frontiers in Microbiology. Frontiers, Lausanne, SWITZERLAND, 11: 2254-2267, (2020).
Metadata of the Global database of environmental microbiomes
figshare.com
xlsx
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucie Malard (2025). Metadata of the Global database of environmental microbiomes [Dataset]. http://doi.org/10.6084/m9.figshare.28358741.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28358741.v1
Dataset updated
Apr 28, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Lucie Malard
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Metadata of the Global database of environmental microbiomes - Bacteria.

Facebook

Twitter

Click to copy link

Link copied

Cite

Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm; Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm (2020). MicrobiomeHD: the human gut microbiome in health and disease [Dataset]. http://doi.org/10.5281/zenodo.569601

MicrobiomeHD: the human gut microbiome in health and disease

Explore at:

9 scholarly articles cite this dataset (View in Google Scholar)

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.569601

Dataset updated

Jan 24, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm; Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Overview

MicrobiomeHD is a standardized database of human gut microbiome studies in health and disease. This database includes publicly available 16S data from published case-control studies and their associated patient metadata. Raw sequencing data for each study was downloaded and processed through a standardized pipeline.

To be included in MicrobiomeHD, datasets have:

publicly available raw sequencing data (fastq or fasta)
publicly available metadata with at least case and control labels for each patient
at least 15 case patients

Currently, MicrobiomeHD is focused on stool samples. Additional samples may be included in certain datasets, as indicated in the metadata.

Files

Additional information about the datasets included in this MicrobiomeHD release are in the MicrobiomeHD github repo https://github.com/cduvallet/microbiomeHD, in the file db/dataset_info.yaml. Top-level identifiers correspond to the dataset IDs used in Duvallet et al. 2017. Sample sizes in the yaml file are those that were described in the papers, and may not exactly reflect the actual data (due to missing/extra data, samples which didn't pass quality control, etc).

Each dataset was downloaded and processed through a standardized pipeline. The raw processing results are available in the *.tar.gz files here. Each file has the same directory structure and files, as described in the pipeline documentation: http://amplicon-sequencing-pipeline.readthedocs.io/en/latest/output.html.

Specific files of interest include:

summary_file.txt: this file contains a summary of all parameters used to process the data
datasetID.metadata.txt: the metadata associated with the samples. Note that some samples in the metadata may not have sequencing data, and vice versa.
RDP/datasetID.otu_table.100.denovo.rdp_assigned: the 100% OTU tables with Latin taxonomic names assigned using the RDP classifier.
datasetID.otu_seqs.100.fasta: representative sequences for each OTU in the 100% OTU table. OTU labels in the OTU table end with d_denovoID - these denovoIDs correspond to the sequences in this file. Processing

The raw data was acquired as described in the supplementary materials of Duvallet et al.'s "Meta analysis of microbiome studies identifies shared and disease-specific patterns".

Raw sequencing data was processed with the Alm lab's in-house 16S processing pipeline: https://github.com/thomasgurry/amplicon_sequencing_pipeline

Pipeline documentation is available at: http://amplicon-sequencing-pipeline.readthedocs.io/

Metadata was extracted from the original papers and/or data sources, and formatted manually.

Contributing

MicrobiomeHD is a resource that can be used to extract disease-specific microbiome signals in individual case-control studies. Many microbes respond non-specifically to health and disease, and the majority of bacterial associations within individual studies overlap with this "core" response. Researchers should cross-check their results with the data presented here to ensure that their identified microbial associations are specific to their disease under study.

We provide an updated list of "core" microbes here, as well as the raw OTU tables for anyone who wishes to reproduce and adapt this analysis to their study question.

If you would like to include your case-control dataset in MicrobiomeHD, please email duvallet[at]mit.edu.

For us to process your data through our standard pipeline, you will need to provide the following files and information about your data:

raw sequencing data in fastq or fasta format (preferably fastq)
information about which processing steps will be required (e.g. removing primers or barcodes, merging paired-end reads, etc)
sample IDs associated with the sequencing data (either mapped to barcodes still in the sequences, or to each de-multiplexed sequencing file)
case/control metadata of each sample
other relevant metadata (e.g. sampling site, if not all samples are stool; sampling time point, if multiple samples per patient were taken; etc)

By using MicrobiomeHD in your own analyses, you agree to contribute your dataset to this database and to make your raw sequencing data (i.e. fastq files) publicly available.

Citing MicrobiomeHD

The MicrobiomeHD database and original publications for each of these datasets are described in Duvallet et al. (2017): http://biorxiv.org/content/early/2017/05/08/134031

If you use any of these datasets in your analysis, please cite both MicrobiomeHD (Duvallet et al. (2017)) and the original publication for each dataset that you use.

The code used to process and analyze this data in Duvallet et al. (2017) is available on github: https://github.com/cduvallet/microbiomeHD

Files

Core genera

file-S3.core_genera.txt: Supplemental Table 3 from Duvallet et al. (2017), listing the core health- and disease-associated microbes.

Datasets

Note that MicrobiomeHD contains all 28 datasets from Duvallet et al. (2017), as well as additional datasets which did not meet the inclusion criteria for the meta-analysis presented in the paper. Additional information about the datasets included in this MicrobiomeHD release are in the original publications and the MicrobiomeHD github repo https://github.com/cduvallet/microbiomeHD, in the file db/dataset_info.yaml.

The sample sizes listed here reflect what was reported in the original publications. Some may have discrepancies between what is reported and what is in the actual data due to missing data, quality issues, barcode mismatches, etc.

asd_son_results.tar.gz (asd_son): NT: 44, ASD: 59

http://dx.doi.org/10.1371/journal.pone.0137725

</li>
<li><strong>autism_kb_results.tar.gz</strong> (<em>asd_kang</em>): H: 20, ASD: 20
<ul>
  <li>http://dx.doi.org/10.1371/journal.pone.0068322</li>
</ul>
</li>
<li><strong>cdi_schubert_results.tar.gz</strong> (<em>noncdi_schubert</em>): H: 155, nonCDI: 89, CDI: 94
<ul>
  <li>http://dx.doi.org/10.1128/mBio.01021-14</li>
</ul>
</li>
<li><strong>cdi_vincent_v3v5_results.tar.gz</strong> (<em>cdi_vincent</em>): H: 25, CDI: 25
<ul>
  <li>http://dx.doi.org/10.1186/2049-2618-1-18</li>
</ul>
</li>
<li><strong>cdi_youngster_results.tar.gz</strong> (<em>cdi_youngster</em>): H: 4, CDI: 19
<ul>
  <li>http://dx.doi.org/10.1093/cid/ciu135</li>
</ul>
</li>
<li><strong>crc_baxter_results.tar.gz</strong> (<em>crc_baxter</em>): adenoma: 198, H: 172, CRC: 120
<ul>
  <li>http://dx.doi.org/10.1186/s13073-016-0290-3</li>
</ul>
</li>
<li><strong>crc_xiang_results.tar.gz</strong> (<em>crc_chen</em>): H: 22, CRC: 21
<ul>
  <li>http://dx.doi.org/10.1371/journal.pone.0039743</li>
</ul>
</li>
<li><strong>crc_zackular_results.tar.gz</strong> (<em>crc_zackular</em>): adenoma: 30, H: 30, CRC: 30
<ul>
  <li>http://dx.doi.org/10.1158/1940-6207.CAPR-14-0129</li>
</ul>
</li>
<li><strong>crc_zeller_results.tar.gz</strong> (<em>crc_zeller</em>): H: 75, CRC: 41
<ul>
  <li>http://dx.doi.org/10.15252/msb.20145645</li>
</ul>
</li>
<li><strong>crc_zhao_results.tar.gz</strong> (<em>crc_wang</em>): H: 56, CRC: 46
<ul>
  <li>http://dx.doi.org/10.1038/ismej.2011.109}</li>
</ul>
</li>
<li><strong>edd_singh_results.tar.gz</strong> (<em>edd_singh</em>): STEC: 28, CAMP: 71, SALM: 66, SHIG: 34, H: 75
<ul>
  <li>http://dx.doi.org/10.1186/s40168-015-0109-2</li>
</ul>
</li>
<li><strong>hiv_dinh_results.tar.gz</strong> (<em>hiv_dinh</em>): H: 16, HIV: 21
<ul>
  <li>http://dx.doi.org/10.1093/infdis/jiu409</li>
</ul>
</li>
<li><strong>hiv_lozupone_results.tar.gz</strong> (<em>hiv_lozupone</em>): H: 13, HIV: 25
<ul>
  <li>http://dx.doi.org/10.1016/j.chom.2013.08.006</li>
</ul>
</li>
<li><strong>hiv_noguerajulian_results.tar.gz</strong> (<em>hiv_noguerajulian</em>): H: 34, HIV: 206
<ul>
  <li>https://doi.org/10.1016%2Fj.ebiom.2016.01.032</li>
</ul>
</li>
<li><strong>ibd_alm_results.tar.gz</strong> (<em>ibd_papa</em>): IBDundef: 1, nonIBD: 24, UC: 43, CD: 23
<ul>
  <li>http://dx.doi.org/10.1371/journal.pone.0039242</li>
</ul>
</li>
<li><strong>ibd_engstrand_maxee_results.tar.gz</strong> (<em>ibd_willing</em>): CCD: 12, H: 35, ICD: 15, UC: 16, ICCD: 2
<ul>
  <li>http://dx.doi.org/10.1053/j.gastro.2010.08.049</li>
</ul>
</li>
<li><strong>ibd_gevers_2014_results.tar.gz</strong> (<em>ibd_gevers</em>): H: 31, CD: 224
<ul>
  <li>http://dx.doi.org/10.1016/j.chom.2014.02.005</li>
</ul>
</li>
<li><strong>ibd_huttenhower_results.tar.gz</strong> (<em>ibd_morgan</em>): H: 18, UC: 48, CD: 62
<ul>
  <li>http://dx.doi.org/10.1186/gb-2012-13-9-r79</li>
</ul>
</li>
<li><strong>mhe_zhang_results.tar.gz</strong> (<em>liv_zhang</em>): CIRR: 25, H: 26, MHE: 26
<ul>
  <li>http://dx.doi.org/10.1038/ajg.2013.221</li>
</ul>
</li>
<li><strong>nash_chan_results.tar.gz</strong> (<em>nash_wong</em>): H: 22, NASH: 16
<ul>
  <li>http://dx.doi.org/10.1371/journal.pone.0062885</li>
</ul>
</li>
<li><strong>nash_ob_baker_results.tar.gz</strong> (<em>nash_zhu</em>): H: 16, NASH: 22, OB: 25
<ul>
  <li>http://dx.doi.org/10.1002/hep.26093</li>
</ul>
</li>
<li><strong>ob_goodrich_results.tar.gz</strong> (<em>ob_goodrich</em>): OW: 322, H: 433, OB: 183
<ul>
  <li>http://dx.doi.org/10.1016/j.cell.2014.09.053</li>
</ul>
</li>
<li><strong>ob_gordon_2008_v2_results.tar.gz</strong> (<em>ob_turnbaugh</em>): H: 61, OB:

Clear search

Close search

Google apps

Main menu

MicrobiomeHD: the human gut microbiome in health and disease

HOMD

Human Microbiome Compendium dataset

The Human Microbiome Project

Pbac v1 database - Panda Gut Microbiome Database

Human Oral Microbiome Database

Microplastics Fish Gut Microbiome Data For EDA/ML

Human Fecal Gut Microbiome proteomics

Microbial Community Database (MiCoDa). A curated global 16S rRNA gene...

Data from: Methanobacteriaceae

Supporting data for the dynamics and stabilization of the Human gut...

MiMeDB

Data from: Microbial Observatory (ISS-MO): Indoor microbiome study of the...

gut-microbiome-allergy-data

Data and code from: Learning a deep language model for microbiomes: The...

Human Skin Microbiome Data (16S rRNA sequencing)

dbBact v2022.07.01 database dump

Data from: Structural Insights into Endobiotic Reactivation by Human Gut...

Drinking Water Microbiome Taxonomic Lineage Abundance Data Set

Metadata of the Global database of environmental microbiomes

MicrobiomeHD: the human gut microbiome in health and disease