100+ datasets found
  1. d

    Guidelines for describing a microbiome data analysis

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Oct 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy Willis; David Clausen (2024). Guidelines for describing a microbiome data analysis [Dataset]. http://doi.org/10.5061/dryad.q2bvq83vc
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 18, 2024
    Dataset provided by
    Dryad
    Authors
    Amy Willis; David Clausen
    Time period covered
    Oct 4, 2024
    Description

    These guidelines were drafted by the authors.

  2. Additional file 2: of iMAP: an integrated bioinformatics and visualization...

    • springernature.figshare.com
    html
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur (2023). Additional file 2: of iMAP: an integrated bioinformatics and visualization pipeline for microbiome data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.8637551.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Metadata profiling report generated automatically by the iMAP to provide a summary of the samples and the associated metadata. This report is the initial step in the RAYG (review-as-go) process. The report also displays the R-commands that demonstrates how to reproduce the report. The pipeline is set to automatically save the output in the “reports” folder as “report1_metadata_profiling.html”. (HTML 953 kb)

  3. Additional file 3: of iMAP: an integrated bioinformatics and visualization...

    • springernature.figshare.com
    html
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur (2023). Additional file 3: of iMAP: an integrated bioinformatics and visualization pipeline for microbiome data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.8637557.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Preprocessing report generated automatically by the iMAP to provide a summary of quality control of the reads. The iMAP pipeline automatically saved the output in the “reports” folder as “report2_read_preprocessing.html”. (HTML 3463 kb)

  4. d

    Multidimensional scaling informed by F-statistic: Visualizing microbiome for...

    • dataone.org
    Updated Oct 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hyungseok Kim; Soobin Kim; Jeff Kimbrel; Megan Morris; Xavier Mayali; Cullen Buie (2025). Multidimensional scaling informed by F-statistic: Visualizing microbiome for inference [Dataset]. http://doi.org/10.5061/dryad.vmcvdnd3x
    Explore at:
    Dataset updated
    Oct 14, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Hyungseok Kim; Soobin Kim; Jeff Kimbrel; Megan Morris; Xavier Mayali; Cullen Buie
    Description

    Multidimensional scaling (MDS) is a dimensionality reduction technique for microbial ecology data analysis that represents the multivariate structure while preserving pairwise distances between samples. While its improvements have enhanced the ability to reveal data patterns by sample groups, these MDS-based methods require prior assumptions for inference, limiting their application in general microbiome analysis. In this study, we introduce a new MDS-based ordination, “F-informed MDS,†which configures the data distribution based on the F-statistic, the ratio of dispersion between groups sharing common and different characteristics. Using simulated compositional datasets, we demonstrate that the proposed method is robust to hyperparameter selection while maintaining statistical significance throughout the ordination process. Various quality metrics for evaluating dimensionality reduction confirm that F-informed MDS is comparable to state-of-the-art methods in preserving both local and ..., , # Multidimensional scaling informed by F-statistic: Visualizing grouped microbiome data with inference

    File: Data.zip

    Description:Â Raw data used in this study. Includes 3 folders and 1 file (see below).
    1. Folder Simulated contains pairwise distances and ordination results from three simulated datasets. Includes 7 subfolders and 6 files.
      • Six files are the original dataset and its associated labels set. The names are formatted as "sim_<*x*>-<*type*>.*csv*" where <*x*> is the replicate number and <*type*> indicates whether the file is the design matrix ("data") or response vector ("Y").
      • Seven subfolders are grouped by the ordination method. Likewise, the file ...,
  5. Additional file 1: of iMAP: an integrated bioinformatics and visualization...

    • springernature.figshare.com
    xlsx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur (2023). Additional file 1: of iMAP: an integrated bioinformatics and visualization pipeline for microbiome data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.8637539.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Format of input files. Includes sample-metadata mapping (sheet 1), sample-read-file mapping in mothur-format (sheet2), and sample-variable mapping (sheet 3, 4 and 5). (XLSX 69 kb)

  6. f

    Data_Sheet_1_Compositional Data Analysis of Periodontal Disease Microbial...

    • datasetcatalog.nlm.nih.gov
    Updated May 17, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ortiz-Velez, Adrian; Kelley, Scott T.; Sisk-Hackworth, Laura; Reed, Micheal B. (2021). Data_Sheet_1_Compositional Data Analysis of Periodontal Disease Microbial Communities.ZIP [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000921744
    Explore at:
    Dataset updated
    May 17, 2021
    Authors
    Ortiz-Velez, Adrian; Kelley, Scott T.; Sisk-Hackworth, Laura; Reed, Micheal B.
    Description

    Periodontal disease (PD) is a chronic, progressive polymicrobial disease that induces a strong host immune response. Culture-independent methods, such as next-generation sequencing (NGS) of bacteria 16S amplicon and shotgun metagenomic libraries, have greatly expanded our understanding of PD biodiversity, identified novel PD microbial associations, and shown that PD biodiversity increases with pocket depth. NGS studies have also found PD communities to be highly host-specific in terms of both biodiversity and the response of microbial communities to periodontal treatment. As with most microbiome work, the majority of PD microbiome studies use standard data normalization procedures that do not account for the compositional nature of NGS microbiome data. Here, we apply recently developed compositional data analysis (CoDA) approaches and software tools to reanalyze multiomics (16S, metagenomics, and metabolomics) data generated from previously published periodontal disease studies. CoDA methods, such as centered log-ratio (clr) transformation, compensate for the compositional nature of these data, which can not only remove spurious correlations but also allows for the identification of novel associations between microbial features and disease conditions. We validated many of the studies’ original findings, but also identified new features associated with periodontal disease, including the genera Schwartzia and Aerococcus and the cytokine C-reactive protein (CRP). Furthermore, our network analysis revealed a lower connectivity among taxa in deeper periodontal pockets, potentially indicative of a more “random” microbiome. Our findings illustrate the utility of CoDA techniques in multiomics compositional data analysis of the oral microbiome.

  7. h

    Supporting data for “Computational analysis of shotgun metagenomic data from...

    • datahub.hku.hk
    Updated Aug 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gordon Qian (2023). Supporting data for “Computational analysis of shotgun metagenomic data from human gut microbiota”. [Dataset]. http://doi.org/10.25442/hku.23939673.v1
    Explore at:
    Dataset updated
    Aug 25, 2023
    Dataset provided by
    HKU DataHub
    Authors
    Gordon Qian
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset contains raw source data and scripts written to generate figures for the thesis entitled "Supporting data for “Computational analysis of shotgun metagenomic data from human gut microbiota". This includes raw species abundance tables used for statistical analysis and methods evaluation, clinical patient records and blood/stool biochemical measurements. Files are separated by their chapter contribution. Chapter 2 files are related to methods comparison and evaluation in gut metagenomics species abundance estimation. Chapter 3 primarily contains raw patient clinical data and their gut microbial compositions for statistical analysis. Chapter 4 contains files related to the scripts utilized to investigate the phenomenon of read coverage bias in human gut metagenomic data.

  8. Metagenomics Analysis GSE199245 Diversity Analysis

    • kaggle.com
    zip
    Updated Dec 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). Metagenomics Analysis GSE199245 Diversity Analysis [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/metagenomics-analysis-gse199245-diversity-analysis
    Explore at:
    zip(1132713 bytes)Available download formats
    Dataset updated
    Dec 4, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains metagenomic sequencing data related to human gut microbiome analysis.

    It is derived from the publicly available dataset GSE199245.

    The dataset focuses on diversity analysis, including alpha and beta diversity metrics.

    It provides insights into microbial community composition in different samples.

    Data includes processed tables and visualizations to facilitate ecological and statistical analyses.

    The dataset can be used to compare microbial diversity between health and disease states.

    It is suitable for researchers working in metagenomics, microbiome studies, and bioinformatics.

    Includes both graphical outputs and tabular summaries of diversity indices.

    Designed for reproducible analysis and further downstream computational studies.

    Offers a convenient starting point for machine learning or statistical modeling of microbiome data.

    Can help in understanding microbial interactions and their potential impact on human health.

    Supports teaching, research, and hypothesis generation in microbial ecology.

    Enables comparison of Shannon, Simpson, and other alpha diversity indices across samples.

    Provides visual tools to interpret community richness and evenness effectively.

  9. Alterations in gut microbiota do not play a causal role in diet-independent...

    • zenodo.org
    bin, csv, tsv, txt
    Updated Nov 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christine Olmstead; Scott Kelley; Scott Kelley; Christine Olmstead (2020). Alterations in gut microbiota do not play a causal role in diet-independent weight gain caused by ovariectomy [Dataset]. http://doi.org/10.5281/zenodo.4203456
    Explore at:
    tsv, bin, csv, txtAvailable download formats
    Dataset updated
    Nov 7, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christine Olmstead; Scott Kelley; Scott Kelley; Christine Olmstead
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These files are associated with the following publication:https://doi.org/10.1210/jendso/bvaa173

    And the sequence data are available at the European Nucleotide Archive: PRJEB40801

    This link contains the metadata, sequences reads, and analysis files used in the study "Alterations in gut microbiota do not play a causal role in diet-independent weight gain caused by ovariectomy."

    Alpha_diversity files:
    File: AlphaDiversity_analysis_sham_ovex
    Description: R statistical analysis file for Faith's Phylogenetic Diversity (Faith's PD) and Observed
    Sequence Variant (SV) alpha diversity metrics
    File: faith_pd_sham_ovex
    Description: QIIME2 output file for Faith's PD alpha diversity measurements for sham/ovex samples
    File: obserevd_svs_sham_ovex
    Description: QIIME2 output file for Observed SVs alpha diversity measurements for sham/ovex samples

    Beta_diversity files:
    File: BetaDiversity_analysis_sham_ovex
    Description: R statistical analysis file for beta diversiy metrics
    File: merged.sv.sham.ovex
    Description: Combined SV table and taxa table for sham/ovex samples
    File: sv.sham.ovex
    Description: SV table for sham/ovex samples
    File: table.sham.ovex.biom
    Description: BIOM formated file for combined SV and taxa data. (For import into Phyloseq)
    File: tax.sham.ovex
    Description: Taxa table for sham/ovex samples
    File: tree.nwk
    Description: Phylogentic tree for sham/ovex data (For import into Phyloseq)

    DeSeq2 Analysis files:
    File: merged.sv.sham.ovex.trimmed
    Description: Combined SV table and taxa table for sham/ovex samples. SVs found in 4 samples or less removed.
    File: sv.table.sham.ovex.trimmed
    Description: SV table for sham/ovex samples. SVs found in 4 samples or less removed.
    File: sham.ovex.trimmed.biom
    Description: BIOM formated file for combined SV and taxa data. SVs found in 4 samples or less removed.(For import into Phyloseq)
    File: tax.sham.ovex.trimmed
    Description: Taxa table for sham/ovex samples. SVs found in 4 samples or less removed.
    File: tree.trimmed.nwk
    Description: Phylogentic tree for sham/ovex data. SVs found in 4 samples or less removed. (For import into Phyloseq)
    File: Phyloseq.DeSeq2.Ovex.Sham
    Description: Log2 Fold change analysis (relative species abundance) done in DESeq2 for time points 1-5.
    File: Phyloseq.DeSeq2.Ovex.Sham.week3
    Description: Log2 Fold change analysis (relative species abundance) done in DESeq2 for time point 3.
    File: Phyloseq.DeSeq2.Ovex.Sham.week4
    Description: Log2 Fold change analysis (relative species abundance) done in DESeq2 for time point 4.
    File: Phyloseq.DeSeq2.Ovex.Sham.week5
    Description: Log2 Fold change analysis (relative species abundance) done in DESeq2 for time point 5.


    Mapping_files including metadata (for use with sequences below):
    File: ovex_mapping
    Description: Mapping file - maps barcodes to samples
    File: ovex_mapping_samples removed
    Description: Mapping file - maps barcodes to reads. Two samples removed for low sequence count.
    1. Plate2 A08 806rcbc103 GCG AGC GAA GTA CCG GAC TAC HVG GGT WTC TAA T 8 870 (T2) Ovex F
    2. Plate2 C02 806rcbc121 GCA ATT AGG TAC CCG GAC TAC HVG GGT WTC TAA T 26 888 (T2) Co-Sham O
    File: ovex_mapping_sham_ovex_samples removed
    Description: Mapping file - maps barcodes to reads. Sham/ovex samples only. One sample removed for low sequence count.
    1. Plate2 A08 806rcbc103 GCG AGC GAA GTA CCG GAC TAC HVG GGT WTC TAA T 8 870 (T2) Ovex F

    QIIME2 Script:

    File: QIIME2_sham_ovex
    Description: This file includes the commands used in the QIIME2 pipeline.

  10. Data from: A Sensitivity Analysis of Methodological Variables Associated...

    • data.nist.gov
    Updated Oct 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2023). A Sensitivity Analysis of Methodological Variables Associated with Microbiome Measurements [Dataset]. http://doi.org/10.18434/mds2-3092
    Explore at:
    Dataset updated
    Oct 5, 2023
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    This repository provides the raw data, analysis code, and results generated during a systematic evaluation of the impact of selected experimental protocol choices on the metagenomic sequencing analysis of microbiome samples. Briefly, a full factorial experimental design was implemented varying biological sample (n=5), operator (n=2), lot (n=2), extraction kit (n=2), 16S variable region (n=2), and reference database (n=3), and the main effects were calculated and compared between parameters (bias effects) and samples (real biological differences). A full description of the effort is provided in the associated publication.

  11. c

    The global Microbiome Sequencing Services market size will be USD 1529.8...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Nov 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2025). The global Microbiome Sequencing Services market size will be USD 1529.8 million in 2025. [Dataset]. https://www.cognitivemarketresearch.com/microbiome-sequencing-service-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Nov 15, 2025
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, the global Microbiome Sequencing Services market size will be USD 1529.8 million in 2025. It will expand at a compound annual growth rate (CAGR) of 11.50% from 2025 to 2033.

    North America held the major market share for more than 40% of the global revenue with a market size of USD 566.03 million in 2025 and will grow at a compound annual growth rate (CAGR) of 9.3% from 2025 to 2033.
    Europe accounted for a market share of over 30% of the global revenue with a market size of USD 443.64 million.
    APAC held a market share of around 23% of the global revenue with a market size of USD 367.15 million in 2025 and will grow at a compound annual growth rate (CAGR) of 13.5% from 2025 to 2033.
    South America has a market share of more than 5% of the global revenue with a market size of USD 58.13 million in 2025 and will grow at a compound annual growth rate (CAGR) of 10.5% from 2025 to 2033.
    The Middle East had a market share of around 2% of the global revenue and was estimated at a market size of USD 61.19 million in 2025 and will grow at a compound annual growth rate (CAGR) of 10.8% from 2025 to 2033.
    Africa had a market share of around 1% of the global revenue and was estimated at a market size of USD 33.66 million in 2025 and will grow at a compound annual growth rate (CAGR) of 11.2% from 2025 to 2033.
    Sequencing by Synthesis category is the fastest growing segment of the Microbiome Sequencing Services industry
    

    Market Dynamics of Microbiome Sequencing Services Market

    Key Drivers for Microbiome Sequencing Services Market

    Rising Prevalence of Chronic Diseases and Lifestyle Disorders to Boost Market Growth

    The increasing incidence of chronic conditions such as obesity, diabetes, gastrointestinal disorders, and autoimmune diseases is a major driver of the microbiome sequencing services market. Research increasingly shows that gut microbiota plays a significant role in immune system regulation, metabolism, and inflammation pathways—critical factors in the development and progression of chronic illnesses. This has heightened the interest of healthcare providers and researchers in microbiome analysis to understand disease mechanisms, identify microbial biomarkers, and develop microbiome-targeted therapies. Additionally, lifestyle changes, poor dietary habits, and environmental exposures further disrupt the gut microbial balance, leading to demand for advanced diagnostic services. Microbiome sequencing enables high-resolution analysis of microbial diversity, composition, and function, helping to tailor personalized treatment plans. For instance, OraSure Technologies, under its Diversigen arm, introduced a service for gut microbiota sample metatranscriptomic sequencing and analysis, advancing capabilities in understanding microbiome dynamics for research and clinical applications.

    https://orasure.com/

    Advancements in Next-Generation Sequencing (NGS) Technologies To Boost Market Growth

    Technological innovations, particularly in next-generation sequencing (NGS), are significantly accelerating the growth of microbiome sequencing services. Modern NGS platforms offer rapid, high-throughput, and cost-effective methods to analyze complex microbial communities with unmatched accuracy and depth. These advancements allow researchers to sequence millions of DNA fragments simultaneously, leading to comprehensive profiling of microbial genomes and their functional genes. Furthermore, the integration of bioinformatics and cloud-based data analysis tools enhances the interpretation of massive datasets generated through sequencing, enabling more meaningful insights into microbiome roles in health and disease.

    Restraint Factor for the Microbiome Sequencing Services Market

    High Cost of Sequencing and Data Analysis Will Limit Market Growth

    The major restraining factor for the microbiome sequencing services market is the high cost associated with sequencing procedures and subsequent data analysis. Although the cost of sequencing technologies like 16S rRNA and whole-genome shotgun sequencing has decreased over time, it still remains substantial, particularly for large-scale or longitudinal studies. Additionally, the infrastructure required for sample processing, high-throughput sequencing platforms, and advanced bioinformatics tools significantly increases the overall project cost. Many small- to mid-sized research labs, clinical settings, or biotech st...

  12. Additional file 4: of iMAP: an integrated bioinformatics and visualization...

    • springernature.figshare.com
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur (2023). Additional file 4: of iMAP: an integrated bioinformatics and visualization pipeline for microbiome data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.8637563.v1
    Explore at:
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur
    Description

    Sequence processing report generated automatically by the iMAP to provide a summary of the output. The report was automatically saved in the “reports” folder as “report3_sequence_processing.html”. (HTML 4205 kb)

  13. MicrobiomeHD: the human gut microbiome in health and disease

    • zenodo.org
    • search.datacite.org
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm; Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm (2020). MicrobiomeHD: the human gut microbiome in health and disease [Dataset]. http://doi.org/10.5281/zenodo.569601
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm; Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Overview

    MicrobiomeHD is a standardized database of human gut microbiome studies in health and disease. This database includes publicly available 16S data from published case-control studies and their associated patient metadata. Raw sequencing data for each study was downloaded and processed through a standardized pipeline.

    To be included in MicrobiomeHD, datasets have:

    • publicly available raw sequencing data (fastq or fasta)
    • publicly available metadata with at least case and control labels for each patient
    • at least 15 case patients

    Currently, MicrobiomeHD is focused on stool samples. Additional samples may be included in certain datasets, as indicated in the metadata.

    Files

    Additional information about the datasets included in this MicrobiomeHD release are in the MicrobiomeHD github repo https://github.com/cduvallet/microbiomeHD, in the file db/dataset_info.yaml. Top-level identifiers correspond to the dataset IDs used in Duvallet et al. 2017. Sample sizes in the yaml file are those that were described in the papers, and may not exactly reflect the actual data (due to missing/extra data, samples which didn't pass quality control, etc).

    Each dataset was downloaded and processed through a standardized pipeline. The raw processing results are available in the *.tar.gz files here. Each file has the same directory structure and files, as described in the pipeline documentation: http://amplicon-sequencing-pipeline.readthedocs.io/en/latest/output.html.

    Specific files of interest include:

    • summary_file.txt: this file contains a summary of all parameters used to process the data
    • datasetID.metadata.txt: the metadata associated with the samples. Note that some samples in the metadata may not have sequencing data, and vice versa.
    • RDP/datasetID.otu_table.100.denovo.rdp_assigned: the 100% OTU tables with Latin taxonomic names assigned using the RDP classifier.
    • datasetID.otu_seqs.100.fasta: representative sequences for each OTU in the 100% OTU table. OTU labels in the OTU table end with d_denovoID - these denovoIDs correspond to the sequences in this file. Processing

    The raw data was acquired as described in the supplementary materials of Duvallet et al.'s "Meta analysis of microbiome studies identifies shared and disease-specific patterns".

    Raw sequencing data was processed with the Alm lab's in-house 16S processing pipeline: https://github.com/thomasgurry/amplicon_sequencing_pipeline

    Pipeline documentation is available at: http://amplicon-sequencing-pipeline.readthedocs.io/

    Metadata was extracted from the original papers and/or data sources, and formatted manually.

    Contributing

    MicrobiomeHD is a resource that can be used to extract disease-specific microbiome signals in individual case-control studies. Many microbes respond non-specifically to health and disease, and the majority of bacterial associations within individual studies overlap with this "core" response. Researchers should cross-check their results with the data presented here to ensure that their identified microbial associations are specific to their disease under study.

    We provide an updated list of "core" microbes here, as well as the raw OTU tables for anyone who wishes to reproduce and adapt this analysis to their study question.

    If you would like to include your case-control dataset in MicrobiomeHD, please email duvallet[at]mit.edu.

    For us to process your data through our standard pipeline, you will need to provide the following files and information about your data:

    • raw sequencing data in fastq or fasta format (preferably fastq)
    • information about which processing steps will be required (e.g. removing primers or barcodes, merging paired-end reads, etc)
    • sample IDs associated with the sequencing data (either mapped to barcodes still in the sequences, or to each de-multiplexed sequencing file)
    • case/control metadata of each sample
    • other relevant metadata (e.g. sampling site, if not all samples are stool; sampling time point, if multiple samples per patient were taken; etc)

    By using MicrobiomeHD in your own analyses, you agree to contribute your dataset to this database and to make your raw sequencing data (i.e. fastq files) publicly available.

    Citing MicrobiomeHD

    The MicrobiomeHD database and original publications for each of these datasets are described in Duvallet et al. (2017): http://biorxiv.org/content/early/2017/05/08/134031

    If you use any of these datasets in your analysis, please cite both MicrobiomeHD (Duvallet et al. (2017)) and the original publication for each dataset that you use.

    The code used to process and analyze this data in Duvallet et al. (2017) is available on github: https://github.com/cduvallet/microbiomeHD

    Files

    Core genera

    file-S3.core_genera.txt: Supplemental Table 3 from Duvallet et al. (2017), listing the core health- and disease-associated microbes.

    Datasets

    Note that MicrobiomeHD contains all 28 datasets from Duvallet et al. (2017), as well as additional datasets which did not meet the inclusion criteria for the meta-analysis presented in the paper. Additional information about the datasets included in this MicrobiomeHD release are in the original publications and the MicrobiomeHD github repo https://github.com/cduvallet/microbiomeHD, in the file db/dataset_info.yaml.

    The sample sizes listed here reflect what was reported in the original publications. Some may have discrepancies between what is reported and what is in the actual data due to missing data, quality issues, barcode mismatches, etc.

  14. Microplastics Fish Gut Microbiome Data For EDA/ML

    • kaggle.com
    zip
    Updated Jul 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ISMAILDRISSI25 (2025). Microplastics Fish Gut Microbiome Data For EDA/ML [Dataset]. https://www.kaggle.com/datasets/ismaildrissi25/microplastics-fish-gut-microbiome-data-for-ml
    Explore at:
    zip(252677 bytes)Available download formats
    Dataset updated
    Jul 19, 2025
    Authors
    ISMAILDRISSI25
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This dataset was compiled for a Master's thesis project focused on investigating the gut microbiota response in fish exposed to microplastics. It contains cleaned and annotated metadata along with taxonomic abundance information and exposure features, prepared for predictive machine learning modeling.

    Context Microplastics (MPs) are emerging pollutants in aquatic ecosystems. Numerous studies have shown that MPs can impact the gut microbial composition of fish. This dataset integrates data from multiple studies through a meta-analysis approach, standardized using bioinformatics and machine learning pipelines.

    Source Sequences and metadata were extracted from public BioProject entries in the NCBI SRA database.

    Data processing: QIIME2, Python (pandas, scikit-learn), Google Colab

    Total size: ~648 FASTQ files → summarized into machine learning-ready tabular format

    Applications Microbiome classification modeling

    Environmental ecotoxicology analysis

    Meta-analysis benchmarking

    Feature importance and interpretability (SHAP, feature selection)

  15. S

    Probiotic intervention for obesity

    • scidb.cn
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    daliang huo; Xiaogang Wang (2025). Probiotic intervention for obesity [Dataset]. http://doi.org/10.57760/sciencedb.32707
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 2, 2025
    Dataset provided by
    Science Data Bank
    Authors
    daliang huo; Xiaogang Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Description: Gut Microbiome and Metabolomics DatasetData Generation Process and Processing Methods:This dataset contains integrated data from gut microbiome and metabolomics analyses performed on C57BL/6J mice to investigate the relationship between gut microbiota composition and host metabolic status. The data generation process includes the following steps:Microbiome Data: Gut microbiome data were generated using 16S rRNA gene sequencing. DNA was extracted from mouse fecal samples, amplified with specific primers, and sequenced on an Illumina MiSeq platform. Data were processed using the QIAseq 16S RNA analysis pipeline.Metabolomics Data: Metabolomics data were obtained via liquid chromatography-mass spectrometry (LC-MS). An Agilent 1290 Infinity LC system coupled with an Agilent 6460 QQQ mass spectrometer was used for analysis. Samples were extracted using acetonitrile-water mixtures, and analyzed over a 0-15 minute window on the LC-MS system.Data Processing: The microbiome data were processed with QIIME2 for quality control and sequence joining. Metabolomics data were analyzed using MassHunter software for peak detection and quantification. All data were normalized and subjected to statistical analyses (e.g., ANOVA, multiple comparisons).Missing Data Information:No systemic missing data were observed during sample collection. However, in some experimental groups, due to technical issues or insufficient sample sizes, a small percentage of metabolomics or microbiome data were missing. Software and Format Information:The data files are in CSV format, which is compatible with software such as Excel, R, and Python for data analysis.Microbiome data processing was performed using the QIIME2 platform, and further statistical analyses were conducted using R.

  16. r

    HMP Data Analysis and Coordination Center

    • rrid.site
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). HMP Data Analysis and Coordination Center [Dataset]. http://identifiers.org/RRID:SCR_004919
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Common repository for diverse human microbiome datsets and minimum reporting standards for Common Fund Human Microbiome Project.

  17. M

    Microbiome Sequencing Services Market Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jan 30, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2026). Microbiome Sequencing Services Market Report [Dataset]. https://www.datainsightsmarket.com/reports/microbiome-sequencing-services-market-8882
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Jan 30, 2026
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2026 - 2034
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The size of the Microbiome Sequencing Services Market market was valued at USD 1.71 Million in 2024 and is projected to reach USD 2.69 Million by 2033, with an expected CAGR of 6.70% during the forecast period. Recent developments include: November 2023: QIAGEN NV launched the Microbiome WGS (whole-genome sequencing) SeqSets which is a comprehensive Sample to Insight workflow designed to provide an easy-to-use solution that maximizes efficiency and reproducibility in microbiome research., June 2023: Zymo Research launched its full-length 16S sequencing service offering researchers high-quality, full-length 16S rRNA gene sequencing for microbiome analysis.. Key drivers for this market are: Huge Investment in Microbiome Research, Rise in Demand for NGS Services; Surge in Genomic Research and Widening Application Area of Microbiome Sequencing. Potential restraints include: Ethical and Legal Issues Related to Genome Sequencing, Lack of Skilled Technicians for NGS Data Analysis. Notable trends are: The Oncology Segment is Expected to Hold a Significant Market Share Over the Forecast Period.

  18. f

    Data_Sheet_1_Overview of data preprocessing for machine learning...

    • datasetcatalog.nlm.nih.gov
    Updated Oct 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    D’Elia, Domenica; Stres, Blaž; Hron, Karel; Dhamo, Xhilda; Ibrahimi, Eliana; Berland, Magali; Shigdel, Rajesh; Marcos-Zambrano, Laura Judith; Simeon, Andrea; Lopes, Marta B. (2023). Data_Sheet_1_Overview of data preprocessing for machine learning applications in human microbiome research.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001030478
    Explore at:
    Dataset updated
    Oct 5, 2023
    Authors
    D’Elia, Domenica; Stres, Blaž; Hron, Karel; Dhamo, Xhilda; Ibrahimi, Eliana; Berland, Magali; Shigdel, Rajesh; Marcos-Zambrano, Laura Judith; Simeon, Andrea; Lopes, Marta B.
    Description

    Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preprocessing and transformation methods applied in recent human microbiome studies to address microbiome data analysis challenges. Our results indicate a limited adoption of transformation methods targeting the statistical characteristics of microbiome sequencing data. Instead, there is a prevalent usage of relative and normalization-based transformations that do not specifically account for the specific attributes of microbiome data. The information on preprocessing and transformations applied to the data before analysis was incomplete or missing in many publications, leading to reproducibility concerns, comparability issues, and questionable results. We hope this mini review will provide researchers and newcomers to the field of human microbiome research with an up-to-date point of reference for various data transformation tools and assist them in choosing the most suitable transformation method based on their research questions, objectives, and data characteristics.

  19. Results of a Galaxy metagenomic analysis of bee gut microbiome data from...

    • zenodo.org
    bin, csv, html, zip
    Updated Jul 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Géraldine PIOT; Géraldine PIOT (2024). Results of a Galaxy metagenomic analysis of bee gut microbiome data from PRJNA977416 [Dataset]. http://doi.org/10.5281/zenodo.12905608
    Explore at:
    zip, bin, html, csvAvailable download formats
    Dataset updated
    Jul 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Géraldine PIOT; Géraldine PIOT
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 26, 2024
    Description

    This dataset contains the outputs of a metagenomic Galaxy workflow run on the raw data of the project PRJNA977416, including the CSV file of associated metadata and the workflow.ga used for the analysis.

    Firstly, it has information on taxonomic assignment with :

    • the reports of all samples for Kraken2, Bracken, and MetaPhlan taxonomic profilers.
    • two tabular files obtained with Taxpasta, which merge samples and standardize taxonomic abundances.
    • for the Bracken standardised abundance, a file with the measures of alpha diversity calculated
    • two HTML files giving access to the Krona diagram for this taxonomic composition.

    Secondly, it contains functional informations with :

    • a tabular file with the relative abundance of all GO terms for all samples
    • a directory detailing pathways and genes families detected.
  20. Additional file 5: of iMAP: an integrated bioinformatics and visualization...

    • springernature.figshare.com
    html
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur (2023). Additional file 5: of iMAP: an integrated bioinformatics and visualization pipeline for microbiome data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.8637575.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Preliminary analysis report generated automatically by the iMAP to provide a summary of conserved taxonomy assigned to OTUs and the initial analysis of OTUs and taxa data. The preliminary analysis report was automatically saved in the “reports” folder as “report4_preliminary_analysis.html”. (HTML 20379 kb)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Amy Willis; David Clausen (2024). Guidelines for describing a microbiome data analysis [Dataset]. http://doi.org/10.5061/dryad.q2bvq83vc

Guidelines for describing a microbiome data analysis

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Oct 18, 2024
Dataset provided by
Dryad
Authors
Amy Willis; David Clausen
Time period covered
Oct 4, 2024
Description

These guidelines were drafted by the authors.

Search
Clear search
Close search
Google apps
Main menu