76 datasets found
  1. Additional file 3: of iMAP: an integrated bioinformatics and visualization...

    • springernature.figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    html
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur (2023). Additional file 3: of iMAP: an integrated bioinformatics and visualization pipeline for microbiome data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.8637557.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Preprocessing report generated automatically by the iMAP to provide a summary of quality control of the reads. The iMAP pipeline automatically saved the output in the “reports” folder as “report2_read_preprocessing.html”. (HTML 3463 kb)

  2. Additional file 2: of iMAP: an integrated bioinformatics and visualization...

    • springernature.figshare.com
    html
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur (2023). Additional file 2: of iMAP: an integrated bioinformatics and visualization pipeline for microbiome data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.8637551.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Metadata profiling report generated automatically by the iMAP to provide a summary of the samples and the associated metadata. This report is the initial step in the RAYG (review-as-go) process. The report also displays the R-commands that demonstrates how to reproduce the report. The pipeline is set to automatically save the output in the “reports” folder as “report1_metadata_profiling.html”. (HTML 953 kb)

  3. Additional file 4: of iMAP: an integrated bioinformatics and visualization...

    • springernature.figshare.com
    html
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur (2023). Additional file 4: of iMAP: an integrated bioinformatics and visualization pipeline for microbiome data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.8637563.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sequence processing report generated automatically by the iMAP to provide a summary of the output. The report was automatically saved in the “reports” folder as “report3_sequence_processing.html”. (HTML 4205 kb)

  4. Data analysis pipeline for investigating drug-host-microbiome relationships...

    • zenodo.org
    application/gzip, bin +2
    Updated Feb 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia K. Forslund; Sofia K. Forslund; Rima Chakaroun; Rima Chakaroun; Maria Zimmermann-Kogadeeva; Maria Zimmermann-Kogadeeva; Lajos Markó; Lajos Markó; Judith Aron-Wisnewsky; Judith Aron-Wisnewsky; Trine Nielsen; Trine Nielsen; TIll Birkner; TIll Birkner (2022). Data analysis pipeline for investigating drug-host-microbiome relationships in cardiometabolic disease (MetaCardis cohort). [Dataset]. http://doi.org/10.5281/zenodo.5463864
    Explore at:
    application/gzip, bin, txt, tsvAvailable download formats
    Dataset updated
    Feb 23, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sofia K. Forslund; Sofia K. Forslund; Rima Chakaroun; Rima Chakaroun; Maria Zimmermann-Kogadeeva; Maria Zimmermann-Kogadeeva; Lajos Markó; Lajos Markó; Judith Aron-Wisnewsky; Judith Aron-Wisnewsky; Trine Nielsen; Trine Nielsen; TIll Birkner; TIll Birkner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    *******************************************************************
    MetaDrugs workflow
    *******************************************************************

    Data analysis pipeline for investigating drug-host-microbiome relationships in cardiometabolic disease (MetaCardis cohort).

    For questions and requests, please contact:
    Sofia K. Forslund (sofia.forslund@mdc-berlin.de)
    and Till Birkner (till.birkner@mdc-berlin.de)

    *******************************************************************
    Contents:
    -------------------------------------------------------------------
    Data files:
    metadata.tar.gz - archived cohort metadata files*
    input_features.tar.gz - archived preprocessed serum and urine metabolome and gut microbiome features
    output_complete.tar.gz - archived example analysis output files for each of the input feature file
    output_rerun.tar.gz - archived empty directory for generating test output files as described in this document

    *Please note: Due to conflicts with Danish Data Protection laws, metadata from the Danish subset of the cohort were removed in this repository. Please reach out for a potential case-by-case access request for access to the complete set of metadata.
    -------------------------------------------------------------------
    Text files:
    archived in feature_names.tar.gz:
    atcs_names - full names for atcs drug compounds
    contrast_names - full names for disease comparison groups
    file_names - brief description of the files in input_features folder
    gmm_names - full names of GMM modules
    kegg_names - full names of KEGG modules
    ko_names - full names of KO modules
    metadata_names - full names of metadata features
    mOTU_names - species names for metagenomics data
    taxon_names - taxon names for metagenomics data
    -------------------------------------------------------------------
    Scripts:
    -------------------------------------------------------------------
    runFrame.r - main wrapper script envoking the analysis pipeline
    -------------------------------------------------------------------
    runFrame_rel_comb.r - script calculating drug combination effects
    runFrame_rel.r - script calculating dosage effects
    testCombPresenceSeparate.r - testing of significant drug combination effects beyond single drug effects
    testDosagePresenceSeparate.pl - testing of significant drug dosage effects beyond single drug effects
    testDosagePresenceSeparateNegative.pl - testing of unique drug dosage effects beyond single drug effects
    -------------------------------------------------------------------
    prettifyResults_uncollapsed.pl - wrapper scripts to create and format a single analysis output file
    makeTables.r - wrapper script to make excel tables with analysis results
    -------------------------------------------------------------------
    Example output file:
    -------------------------------------------------------------------
    output_all_formatted_noc_uncollapsed_complete.tsv - contains all disease-drug-host-microbiome feature analysis results in one place.
    *******************************************************************

  5. Additional file 1: of iMAP: an integrated bioinformatics and visualization...

    • springernature.figshare.com
    xlsx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur (2023). Additional file 1: of iMAP: an integrated bioinformatics and visualization pipeline for microbiome data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.8637539.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Format of input files. Includes sample-metadata mapping (sheet 1), sample-read-file mapping in mothur-format (sheet2), and sample-variable mapping (sheet 3, 4 and 5). (XLSX 69 kb)

  6. M

    Microbiome Sequencing Services Market Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Feb 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Microbiome Sequencing Services Market Report [Dataset]. https://www.datainsightsmarket.com/reports/microbiome-sequencing-services-market-8882
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Feb 26, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global microbiome sequencing services market is experiencing robust growth, with a market size of $1.71 billion in 2025 and a projected Compound Annual Growth Rate (CAGR) of 6.70% from 2025 to 2033. This expansion is driven by several key factors. Advancements in sequencing technologies, such as Sequencing by Ligation (SBL), Sequencing by Synthesis (SBS), Shotgun Sequencing, and Targeted Gene Sequencing, are reducing costs and increasing throughput, making microbiome analysis more accessible for research and clinical applications. The rising prevalence of chronic diseases like gastrointestinal disorders, infectious diseases, CNS diseases, and cancer, coupled with a growing understanding of the microbiome's role in these conditions, fuels demand for these services. Furthermore, increasing investments in research and development, coupled with the growing adoption of personalized medicine approaches which leverage microbiome data for diagnosis and treatment, are significant drivers. Key market trends include the emergence of cloud-based microbiome analysis platforms, the development of novel bioinformatics tools for data interpretation, and the increasing integration of microbiome sequencing into clinical workflows. However, challenges remain, including the high cost of advanced sequencing technologies, the complexity of data analysis, and the lack of standardized protocols for microbiome research, which act as market restraints. The market is segmented by technology and application, with Sequencing by Synthesis (SBS) currently dominating the technology segment, and Gastrointestinal Diseases and Oncology leading the application segment. Geographically, North America and Europe currently hold significant market shares, driven by robust healthcare infrastructure and substantial research funding. The competitive landscape is characterized by a mix of established players and emerging companies, including ZIFO, Baseclear BV, Metabiomics, Zymo Research, Microbiome Insights Inc, CosmosID, Shanghai Realbio Technology (RBT) Co Ltd, Rancho Biosciences, Merieux Nutrisciences Corporations (Biofortis), Clinical Microbiomics AS, MR DNA, and Locus Biosciences (EPIBIOME), among others. These companies are actively engaged in developing innovative technologies, expanding their service offerings, and forging strategic partnerships to gain a competitive edge. The market is expected to witness increased consolidation and strategic acquisitions in the coming years. Future growth will be significantly influenced by the development of more accurate and cost-effective sequencing technologies, the expansion of clinical applications, the establishment of standardized data analysis pipelines, and the growing adoption of microbiome-based therapeutics. The Asia Pacific region presents a significant growth opportunity due to rising healthcare expenditure, increasing awareness of microbiome research, and a growing prevalence of chronic diseases. Continued research into the complex interplay between the microbiome and human health will undoubtedly shape the future trajectory of this rapidly expanding market, driving further innovation and market penetration across various geographical regions and application areas. This report provides a detailed analysis of the Microbiome Sequencing Services market, projected to reach multi-billion dollar valuations in the coming years. It examines market concentration, key trends, dominant segments, leading players, and significant recent developments. Recent developments include: November 2023: QIAGEN NV launched the Microbiome WGS (whole-genome sequencing) SeqSets which is a comprehensive Sample to Insight workflow designed to provide an easy-to-use solution that maximizes efficiency and reproducibility in microbiome research., June 2023: Zymo Research launched its full-length 16S sequencing service offering researchers high-quality, full-length 16S rRNA gene sequencing for microbiome analysis.. Key drivers for this market are: Huge Investment in Microbiome Research, Rise in Demand for NGS Services; Surge in Genomic Research and Widening Application Area of Microbiome Sequencing. Potential restraints include: Ethical and Legal Issues Related to Genome Sequencing, Lack of Skilled Technicians for NGS Data Analysis. Notable trends are: The Oncology Segment is Expected to Hold a Significant Market Share Over the Forecast Period.

  7. Additional file 5: of iMAP: an integrated bioinformatics and visualization...

    • springernature.figshare.com
    html
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur (2023). Additional file 5: of iMAP: an integrated bioinformatics and visualization pipeline for microbiome data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.8637575.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Preliminary analysis report generated automatically by the iMAP to provide a summary of conserved taxonomy assigned to OTUs and the initial analysis of OTUs and taxa data. The preliminary analysis report was automatically saved in the “reports” folder as “report4_preliminary_analysis.html”. (HTML 20379 kb)

  8. MicrobiomeHD: the human gut microbiome in health and disease

    • zenodo.org
    • search.datacite.org
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm; Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm (2020). MicrobiomeHD: the human gut microbiome in health and disease [Dataset]. http://doi.org/10.5281/zenodo.569601
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm; Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Overview

    MicrobiomeHD is a standardized database of human gut microbiome studies in health and disease. This database includes publicly available 16S data from published case-control studies and their associated patient metadata. Raw sequencing data for each study was downloaded and processed through a standardized pipeline.

    To be included in MicrobiomeHD, datasets have:

    • publicly available raw sequencing data (fastq or fasta)
    • publicly available metadata with at least case and control labels for each patient
    • at least 15 case patients

    Currently, MicrobiomeHD is focused on stool samples. Additional samples may be included in certain datasets, as indicated in the metadata.

    Files

    Additional information about the datasets included in this MicrobiomeHD release are in the MicrobiomeHD github repo https://github.com/cduvallet/microbiomeHD, in the file db/dataset_info.yaml. Top-level identifiers correspond to the dataset IDs used in Duvallet et al. 2017. Sample sizes in the yaml file are those that were described in the papers, and may not exactly reflect the actual data (due to missing/extra data, samples which didn't pass quality control, etc).

    Each dataset was downloaded and processed through a standardized pipeline. The raw processing results are available in the *.tar.gz files here. Each file has the same directory structure and files, as described in the pipeline documentation: http://amplicon-sequencing-pipeline.readthedocs.io/en/latest/output.html.

    Specific files of interest include:

    • summary_file.txt: this file contains a summary of all parameters used to process the data
    • datasetID.metadata.txt: the metadata associated with the samples. Note that some samples in the metadata may not have sequencing data, and vice versa.
    • RDP/datasetID.otu_table.100.denovo.rdp_assigned: the 100% OTU tables with Latin taxonomic names assigned using the RDP classifier.
    • datasetID.otu_seqs.100.fasta: representative sequences for each OTU in the 100% OTU table. OTU labels in the OTU table end with d_denovoID - these denovoIDs correspond to the sequences in this file. Processing

    The raw data was acquired as described in the supplementary materials of Duvallet et al.'s "Meta analysis of microbiome studies identifies shared and disease-specific patterns".

    Raw sequencing data was processed with the Alm lab's in-house 16S processing pipeline: https://github.com/thomasgurry/amplicon_sequencing_pipeline

    Pipeline documentation is available at: http://amplicon-sequencing-pipeline.readthedocs.io/

    Metadata was extracted from the original papers and/or data sources, and formatted manually.

    Contributing

    MicrobiomeHD is a resource that can be used to extract disease-specific microbiome signals in individual case-control studies. Many microbes respond non-specifically to health and disease, and the majority of bacterial associations within individual studies overlap with this "core" response. Researchers should cross-check their results with the data presented here to ensure that their identified microbial associations are specific to their disease under study.

    We provide an updated list of "core" microbes here, as well as the raw OTU tables for anyone who wishes to reproduce and adapt this analysis to their study question.

    If you would like to include your case-control dataset in MicrobiomeHD, please email duvallet[at]mit.edu.

    For us to process your data through our standard pipeline, you will need to provide the following files and information about your data:

    • raw sequencing data in fastq or fasta format (preferably fastq)
    • information about which processing steps will be required (e.g. removing primers or barcodes, merging paired-end reads, etc)
    • sample IDs associated with the sequencing data (either mapped to barcodes still in the sequences, or to each de-multiplexed sequencing file)
    • case/control metadata of each sample
    • other relevant metadata (e.g. sampling site, if not all samples are stool; sampling time point, if multiple samples per patient were taken; etc)

    By using MicrobiomeHD in your own analyses, you agree to contribute your dataset to this database and to make your raw sequencing data (i.e. fastq files) publicly available.

    Citing MicrobiomeHD

    The MicrobiomeHD database and original publications for each of these datasets are described in Duvallet et al. (2017): http://biorxiv.org/content/early/2017/05/08/134031

    If you use any of these datasets in your analysis, please cite both MicrobiomeHD (Duvallet et al. (2017)) and the original publication for each dataset that you use.

    The code used to process and analyze this data in Duvallet et al. (2017) is available on github: https://github.com/cduvallet/microbiomeHD

    Files

    Core genera

    file-S3.core_genera.txt: Supplemental Table 3 from Duvallet et al. (2017), listing the core health- and disease-associated microbes.

    Datasets

    Note that MicrobiomeHD contains all 28 datasets from Duvallet et al. (2017), as well as additional datasets which did not meet the inclusion criteria for the meta-analysis presented in the paper. Additional information about the datasets included in this MicrobiomeHD release are in the original publications and the MicrobiomeHD github repo https://github.com/cduvallet/microbiomeHD, in the file db/dataset_info.yaml.

    The sample sizes listed here reflect what was reported in the original publications. Some may have discrepancies between what is reported and what is in the actual data due to missing data, quality issues, barcode mismatches, etc.

  9. M

    Microbiome Sequencing Services Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Microbiome Sequencing Services Report [Dataset]. https://www.datainsightsmarket.com/reports/microbiome-sequencing-services-1772262
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    May 15, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global microbiome sequencing services market is experiencing robust growth, driven by the increasing understanding of the microbiome's role in human health and disease. Advancements in sequencing technologies, such as next-generation sequencing (NGS), are significantly reducing costs and increasing throughput, making microbiome analysis more accessible to researchers, pharmaceutical companies, and healthcare providers. The pharmaceutical and biotech sectors are major drivers, leveraging microbiome sequencing to identify novel drug targets and develop personalized therapies for various conditions, including gastrointestinal disorders, autoimmune diseases, and even cancer. Academic institutions are also contributing significantly to the market's expansion through fundamental research and the development of innovative analytical tools. Regulatory support and increased funding for microbiome research further bolster market growth. While the market is currently dominated by sequencing by synthesis (SBS) methods, other technologies like sequencing by ligation are gaining traction due to their potential for specific applications. The market exhibits significant regional variations, with North America and Europe currently holding the largest market share due to the presence of well-established research infrastructure and a high concentration of key players. However, the Asia-Pacific region is projected to witness the fastest growth in the coming years, driven by increasing healthcare spending and rising awareness of microbiome-related health issues. Challenges remain, primarily related to data analysis and interpretation. The sheer volume of data generated by microbiome sequencing requires sophisticated bioinformatics tools and expertise for accurate and meaningful insights. Furthermore, standardization of protocols and data analysis pipelines is crucial for ensuring reproducibility and comparability of results across different studies and laboratories. Despite these hurdles, the market is poised for sustained growth, propelled by ongoing technological innovation, the increasing adoption of microbiome-based diagnostics and therapeutics, and a growing understanding of the complex interplay between the microbiome and human health. The diverse applications across research, diagnostics, and therapeutics suggest a broad and expanding market with significant future potential, particularly in personalized medicine and precision healthcare.

  10. Microplastics Fish Gut Microbiome Data For EDA/ML

    • kaggle.com
    zip
    Updated Jul 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ISMAILDRISSI25 (2025). Microplastics Fish Gut Microbiome Data For EDA/ML [Dataset]. https://www.kaggle.com/datasets/ismaildrissi25/microplastics-fish-gut-microbiome-data-for-ml
    Explore at:
    zip(252677 bytes)Available download formats
    Dataset updated
    Jul 19, 2025
    Authors
    ISMAILDRISSI25
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This dataset was compiled for a Master's thesis project focused on investigating the gut microbiota response in fish exposed to microplastics. It contains cleaned and annotated metadata along with taxonomic abundance information and exposure features, prepared for predictive machine learning modeling.

    Context Microplastics (MPs) are emerging pollutants in aquatic ecosystems. Numerous studies have shown that MPs can impact the gut microbial composition of fish. This dataset integrates data from multiple studies through a meta-analysis approach, standardized using bioinformatics and machine learning pipelines.

    Source Sequences and metadata were extracted from public BioProject entries in the NCBI SRA database.

    Data processing: QIIME2, Python (pandas, scikit-learn), Google Colab

    Total size: ~648 FASTQ files → summarized into machine learning-ready tabular format

    Applications Microbiome classification modeling

    Environmental ecotoxicology analysis

    Meta-analysis benchmarking

    Feature importance and interpretability (SHAP, feature selection)

  11. f

    Table_1_Statistical and Machine Learning Techniques in Human Microbiome...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Feb 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gundogdu, Aycan; Stres, Blaz; Klammsteiner, Thomas; Pongor, Sándor; Nedyalkova, Miroslava; Hron, Karel; Gómez-Cabrero, David; Truică, Ciprian-Octavian; Sampri, Alexia; Marcos-Zambrano, Laura Judith; Yilmaz, Ercument; Zeller, Georg; Roshchupkin, Gennady; Truu, Jaak; May, Patrick; Lahti, Leo; Vlachakis, Dimitrios; Promponas, Vasilis J.; Elbere, Ilze; Suharoschi, Ramona; Bakir-Gungor, Burcu; Pašić, Lejla; Pio, Gianvito; Adilovic, Muhamed; Marques, Cláudia; Falquet, Laurent; D’Elia, Domenica; Claesson, Marcus J.; Saez-Rodriguez, Julio; Desai, Mahesh S.; Santa Pau, Enrique Carrillo-de; Lopes, Marta B.; Vilne, Baiba; Zomer, Aldert L.; Mason, Michael; Shigdel, Rajesh; Przymus, Piotr; Aydemir, Onder; Moreno-Indias, Isabel (2021). Table_1_Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions.XLSX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000787717
    Explore at:
    Dataset updated
    Feb 22, 2021
    Authors
    Gundogdu, Aycan; Stres, Blaz; Klammsteiner, Thomas; Pongor, Sándor; Nedyalkova, Miroslava; Hron, Karel; Gómez-Cabrero, David; Truică, Ciprian-Octavian; Sampri, Alexia; Marcos-Zambrano, Laura Judith; Yilmaz, Ercument; Zeller, Georg; Roshchupkin, Gennady; Truu, Jaak; May, Patrick; Lahti, Leo; Vlachakis, Dimitrios; Promponas, Vasilis J.; Elbere, Ilze; Suharoschi, Ramona; Bakir-Gungor, Burcu; Pašić, Lejla; Pio, Gianvito; Adilovic, Muhamed; Marques, Cláudia; Falquet, Laurent; D’Elia, Domenica; Claesson, Marcus J.; Saez-Rodriguez, Julio; Desai, Mahesh S.; Santa Pau, Enrique Carrillo-de; Lopes, Marta B.; Vilne, Baiba; Zomer, Aldert L.; Mason, Michael; Shigdel, Rajesh; Przymus, Piotr; Aydemir, Onder; Moreno-Indias, Isabel
    Description

    The human microbiome has emerged as a central research topic in human biology and biomedicine. Current microbiome studies generate high-throughput omics data across different body sites, populations, and life stages. Many of the challenges in microbiome research are similar to other high-throughput studies, the quantitative analyses need to address the heterogeneity of data, specific statistical properties, and the remarkable variation in microbiome composition across individuals and body sites. This has led to a broad spectrum of statistical and machine learning challenges that range from study design, data processing, and standardization to analysis, modeling, cross-study comparison, prediction, data science ecosystems, and reproducible reporting. Nevertheless, although many statistics and machine learning approaches and tools have been developed, new techniques are needed to deal with emerging applications and the vast heterogeneity of microbiome data. We review and discuss emerging applications of statistical and machine learning techniques in human microbiome studies and introduce the COST Action CA18131 “ML4Microbiome” that brings together microbiome researchers and machine learning experts to address current challenges such as standardization of analysis pipelines for reproducibility of data analysis results, benchmarking, improvement, or development of existing and new tools and ontologies.

  12. Mapping human microbiome drug metabolism by gut bacteria and their genes

    • data.niaid.nih.gov
    xml
    Updated Jul 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Zimmermann (2019). Mapping human microbiome drug metabolism by gut bacteria and their genes [Dataset]. https://data.niaid.nih.gov/resources?id=mtbls896
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Jul 10, 2019
    Dataset provided by
    National Institute of Allergy and Infectious Diseaseshttp://www.niaid.nih.gov/
    Authors
    Michael Zimmermann
    Variables measured
    pH, Phylum, Timepoint, Multiomics, Catalogue ID, Metabolomics, Treatment group, Common name of organism
    Description

    Individuals vary widely in their drug responses, which can be dangerous and expensive due to treatment delays and adverse effects. Growing evidence implicates the gut microbiome in this variability, however the molecular mechanisms remain largely unknown. We measured the ability of 76 diverse human gut bacteria to metabolize 271 oral drugs and found that many of these drugs are chemically modified by microbes. We combined high-throughput genetics with mass spectrometry to systematically identify drug-metabolizing microbial gene products. These microbiome-encoded enzymes can directly and significantly impact intestinal and systemic drug metabolism in mice, and can explain drug-metabolizing activities of human gut bacteria and communities based on their genomic contents. These causal links between microbiota gene content and metabolic activities connect interpersonal microbiome variability to interpersonal differences in drug metabolism, which has implications for medical therapy and drug development across multiple disease indications.

    Additional data related to this study can also be found by the following links; - Raw sequencing data; ENA (accession no. PRJEB31790) - Data for Figures; FigShare - Analysis pipeline schemes, scripts and input files for analzing data and generating figures; GitHub and archived Zenodo

  13. Supplementary Datasets for dadasnake workflow

    • zenodo.org
    application/gzip, bin +1
    Updated Nov 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna Heintz-Buschart; Anna Heintz-Buschart (2020). Supplementary Datasets for dadasnake workflow [Dataset]. http://doi.org/10.5281/zenodo.3826697
    Explore at:
    bin, tsv, application/gzipAvailable download formats
    Dataset updated
    Nov 2, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anna Heintz-Buschart; Anna Heintz-Buschart
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains configuration and results files for the proof-of-principle of the dadasnake pipeline. Includes tables with the composition of ground-truth data or mock-communities.

    dadasnake wraps pre-processing of sequencing reads, delineation of exact sequencing variants using the favorably benchmarked, widely-used the DADA2 algorithm, taxonomic classification and post-processing of the resultant tables, and hand-off in standard formats, into a user-friendly, one-command Snakemake pipeline. The suitability of the provided default configurations is demonstrated using mock-community data from bacteria and archaea, as well as fungi. By use of Snakemake, dadasnake makes efficient use of high-performance computing infrastructures. Easy user configuration guarantees flexibility of all steps, including the processing of data from multiple sequencing platforms. dadasnake facilitates easy installation via conda environments. dadasnake is available at https://github.com/a-h-b/dadasnake .

  14. d

    Post-epizootic microbiome associations across communities of neotropical...

    • datadryad.org
    • search.dataone.org
    • +1more
    zip
    Updated Apr 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Phillip Jervis; Pol Pintanel; Kevin Hopkins; Claudia Wierzbicki; Jennifer Shelton; Emily Skelly; Goncalo Rosa; Diego Almeida-Reinoso; Maria Eugenia-Ordonez; Santiago Ron; Xavier Harrison; Andres Merino-Viteri; Matthew Fisher (2021). Post-epizootic microbiome associations across communities of neotropical amphibians [Dataset]. http://doi.org/10.5061/dryad.pg4f4qrnb
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 1, 2021
    Dataset provided by
    Dryad
    Authors
    Phillip Jervis; Pol Pintanel; Kevin Hopkins; Claudia Wierzbicki; Jennifer Shelton; Emily Skelly; Goncalo Rosa; Diego Almeida-Reinoso; Maria Eugenia-Ordonez; Santiago Ron; Xavier Harrison; Andres Merino-Viteri; Matthew Fisher
    Time period covered
    Mar 1, 2021
    Description

    Post‐epizootic microbiome associations across communities of neotropical amphibians README

    File structure:

    Pipelines Ecuador_pipeline (contains bacterial data and bioinformatic pipeline) Ecuador_pipeline ITS (contains fungal data and bioinformatic pipeline)

    Statistical analysis Question 1 (contains all data and script for analysis of BdqPCR data) Question 2 (contains all data and scripts for analysis of bacterial data and fungal data individually) Dual kingdom analysis (script for analysis of both datasets within a single microbiome)

    Subfolder contents:

    Pipelines Ecuador pipeline Database (Reference training set for bacterial taxonomic IDs) Fastq_plate1 (raw r16S data) Fastq_plate2 (raw r16S data) &nb...

  15. Bioinformatic pipeline: Genomic diversity landscape of the honey bee gut...

    • zenodo.org
    application/gzip, txt
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kirsten Ellegaard; Kirsten Ellegaard; Philipp Engel; Philipp Engel (2020). Bioinformatic pipeline: Genomic diversity landscape of the honey bee gut microbiota [Dataset]. http://doi.org/10.5281/zenodo.1479668
    Explore at:
    application/gzip, txtAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kirsten Ellegaard; Kirsten Ellegaard; Philipp Engel; Philipp Engel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data-set describes the full bioinformatic pipeline used to analyze 54 metagenomic samples of the honey bee gut microbiota. Each sample was isolated from an individual honey bee, and all samples originate from two colonies of the Engel laboratory at the University of Lausanne, Switzerland. The full raw data-set is available from the sequence-read archive: SRP150166.

    A publication based on this analysis is currently under review, with the title: "Genomic diversity landscape of the honey bee gut microbiota", and an upload to Biorxiv is also underway.

    The data-set contains tar-balls for the different main workflows of the analysis. Dowload and unpack to view the contents (tar -zxvf filename.tar.gz). For each workflow, all directories contain README.txt files, describing the contents of the directory. Due to size constraints, some intermediate files have been omitted, and some workflows are demonstrated for a subset of the data. However, the full analysis can be reproduced from the raw data, using the provided scripts.

    Scripts are included within workflow directories, and are also provided as a separate tar-ball for convenience. All perl-scripts come with documentation, which can be viewed by typing: "perl script_name.pl -h". For R scripts, the usage is indicated as a comment in the top lines of each script. Note that many of the scripts require specific input-files to be present in the run-directory. Their usage is demonstrated within the workflow directories in bash-scripts (*.sh). Commands used for generating plots and some statistics are given within workflow directories in text-files "R.commands" when applicable.

    Aside from custom code, the pipeline also utilizes various open-source Software packages, which are detailed in the file "software_dependencies.txt". Note, while many of the scripts will run fast on any computer, some steps of the pipeline are computationally demanding, and will require significant computing time, as well as storage space. When scripts are known to be time-consuming, this is indicated in the script help message.

  16. d

    Data from: A novel method to assess the integrity of frozen archival DNA...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Aug 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krista Sumby; John Stephen; Jeremy Austin; Rhiannon Schilling; Timothy Cavagnaro (2024). A novel method to assess the integrity of frozen archival DNA samples: Alpha-diversity ratios of short and long-read 16S rRNA gene sequences [Dataset]. http://doi.org/10.5061/dryad.v9s4mw73t
    Explore at:
    Dataset updated
    Aug 10, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Krista Sumby; John Stephen; Jeremy Austin; Rhiannon Schilling; Timothy Cavagnaro
    Description

    Archival DNA samples collected and analysed for a range of research and applied questions have accumulated in the laboratories of universities, government agencies, and commercial service providers for decades. These DNA archives represent a valuable, yet largely untapped repository of genomic information. With lowering costs of, and increasing access to, high-throughput sequencing, we predict an increase in retrospective research to explore the wealth of information that resides in these archival samples. However, for this to occur, we need confidence in the integrity of the DNA samples, often stored under sub-optimal conditions and their fitness of purpose for downstream genomic analysis. Here, we borrow from a well-established concept in ancient DNA to evaluate sample integrity, defined as loss of information content in recovered amplicons, of frozen DNA samples and based on the ratio of ⠺-diversity of short and long-read 16S rRNA gene sequences. The 16S rRNA variable region of eight..., Data analysis The Pacific Biosciences Nextflow pipeline (https://github.com/PacificBiosciences/pb-16S-nf) was followed for initial data processing. Raw reads were processed, including demultiplexing by “q2-demux†in QIIME2, and quality control was assessed with q2-cutadapt. Quantitative Insights Into Microbial Ecology 2 (QIIME2 v. 2018.11) software was used to analyse the trimmed reorientated sequences (Bolyen et al., 2019). The DADA2 denoising option (Callahan et al., 2016) was selected to pick up the representative reads for generating an amplicon sequence variants (ASVs) table. ASVs generated from DADA2 were classified using the Naive Bayes classifier and SILVA reference database version 138.1 (Quast et al., 2013). For analysis between the platforms the feature table of each platform was merged, as were the representative sequences post-DADA2 denoising with QIIME2 before building the phylogenetic tree and assigning taxonomy. Taxonomic diversity analysis All analysis was conducted wit..., , # A novel method to assess the integrity of frozen archival DNA samples: Alpha-diversity ratios of short and long-read 16S rRNA gene sequences

    https://doi.org/10.5061/dryad.v9s4mw73t

    We utilized DNA extracted from various agricultural soils that were stored at -20°C in a gene bank freezer room over 20 years by the South Australian Research and Development Institute (SARDI). This DNA was collected through the PREDICTA® B DNA-based soil disease testing service for broadacre farming (PREDICTA® B). We selected 87 soil DNA extracts from three Australian states (regions), spanning 10 distinct time bins between 2001 and 2020. Our primary concern was the potential DNA degradation in the oldest samples. Therefore, we included samples from the first four years (2001-2004) and selected samples more sporadically from subsequent years (2005 onwards). Alpha-diversity ratios, using Shannon's diversity index, were calculated to determine if there was a d...

  17. d

    Data from: Drivers of change and stability in the gut microbiota of an...

    • datadryad.org
    • search.dataone.org
    • +1more
    zip
    Updated Jul 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sasha Pekarsky; Ammon Corl; Sondra Turjeman; Pauline Kamath; Wayne Getz; Bowie Rauri; Yuri Markin; Ran Nathan (2021). Drivers of change and stability in the gut microbiota of an omnivorous avian migrant exposed to artificial food supplementation [Dataset]. http://doi.org/10.5061/dryad.02v6wwq3m
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 8, 2021
    Dataset provided by
    Dryad
    Authors
    Sasha Pekarsky; Ammon Corl; Sondra Turjeman; Pauline Kamath; Wayne Getz; Bowie Rauri; Yuri Markin; Ran Nathan
    Time period covered
    Jul 5, 2021
    Description

    Fecal samples were collected for microbiome analysis. PCR-amplification, library preparation and sequencing of the 16s V4 region for each sample was conducted at the Argonne Sequencing Center at Argonne National Laboratory (Lemont, IL). QIIME 2 was used to demultiplex the raw sequence data and DADA2 was used to infer amplicon sequence variants.

    Movement data was collected using GPS-GSM transmitters on free ranging cranes that visited sampled fields up to three days prior to fecal sample collection for host-associated bacterial analysis. Habitat annotation was done using satellite imagery from Sentinel-2 in Russia and GIS information provided by the Ministry of Agriculture and Rural Development in Israel.

  18. e

    MGnify (Analyses)

    • ebi.ac.uk
    Updated Dec 1, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). MGnify (Analyses) [Dataset]. https://www.ebi.ac.uk/ebisearch/data-coverage
    Explore at:
    Dataset updated
    Dec 1, 2015
    Description

    MGnify offers pipelines for the analysis and archiving of microbiome data to help determine the taxonomic diversity and functional & metabolic potential of environmental samples. Projects/Studies represents a collection of samples and experiments (runs) applied to these samples.

  19. f

    Data from: Uncovering Hidden Members and Functions of the Soil Microbiome...

    • datasetcatalog.nlm.nih.gov
    • acs.figshare.com
    Updated Jul 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mitchell, Hugh D.; Lee, Joon-Yong; Burnet, Meagan C.; Nakayasu, Ernesto S.; Jansson, Janet K.; Jenson, Sarah C.; Burnum-Johnson, Kristin E.; Wu, Ruonan; Nicora, Carrie D.; Merkley, Eric D.; Payne, Samuel H. (2022). Uncovering Hidden Members and Functions of the Soil Microbiome Using De Novo Metaproteomics [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000388603
    Explore at:
    Dataset updated
    Jul 6, 2022
    Authors
    Mitchell, Hugh D.; Lee, Joon-Yong; Burnet, Meagan C.; Nakayasu, Ernesto S.; Jansson, Janet K.; Jenson, Sarah C.; Burnum-Johnson, Kristin E.; Wu, Ruonan; Nicora, Carrie D.; Merkley, Eric D.; Payne, Samuel H.
    Description

    Metaproteomics has been increasingly utilized for high-throughput characterization of proteins in complex environments and has been demonstrated to provide insights into microbial composition and functional roles. However, significant challenges remain in metaproteomic data analysis, including creation of a sample-specific protein sequence database. A well-matched database is a requirement for successful metaproteomics analysis, and the accuracy and sensitivity of PSM identification algorithms suffer when the database is incomplete or contains extraneous sequences. When matched DNA sequencing data of the sample is unavailable or incomplete, creating the proteome database that accurately represents the organisms in the sample is a challenge. Here, we leverage a de novo peptide sequencing approach to identify the sample composition directly from metaproteomic data. First, we created a deep learning model, Kaiko, to predict the peptide sequences from mass spectrometry data and trained it on 5 million peptide–spectrum matches from 55 phylogenetically diverse bacteria. After training, Kaiko successfully identified organisms from soil isolates and synthetic communities directly from proteomics data. Finally, we created a pipeline for metaproteome database generation using Kaiko. We tested the pipeline on native soils collected in Kansas, showing that the de novo sequencing model can be employed as an alternative and complementary method to construct the sample-specific protein database instead of relying on (un)matched metagenomes. Our pipeline identified all highly abundant taxa from 16S rRNA sequencing of the soil samples and uncovered several additional species which were strongly represented only in proteomic data.

  20. Additional file 2 of tidyMicro: a pipeline for microbiome data analysis and...

    • springernature.figshare.com
    txt
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charlie M. Carpenter; Daniel N. Frank; Kayla Williamson; Jaron Arbet; Brandie D. Wagner; Katerina Kechris; Miranda E. Kroehl (2023). Additional file 2 of tidyMicro: a pipeline for microbiome data analysis and visualization using the tidyverse in R [Dataset]. http://doi.org/10.6084/m9.figshare.13685090.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Charlie M. Carpenter; Daniel N. Frank; Kayla Williamson; Jaron Arbet; Brandie D. Wagner; Katerina Kechris; Miranda E. Kroehl
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 2. Model estimates table. Column 1: Taxa names. Column 2: Model coefficients. Column 3: Estimated rate ratios from exponentiated β estimates. For models with interaction terms, the appropriate β estimates are summed before being exponentiated. Column 4: Exponentiated 95% Wald confidence intervals. For models with interaction terms, the appropriate β estimates and covariance terms are summed for the Wald intervals. Column 5: Z-statistics from β estimates. Column 6: False discovery rate adjusted p-value

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur (2023). Additional file 3: of iMAP: an integrated bioinformatics and visualization pipeline for microbiome data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.8637557.v1
Organization logo

Additional file 3: of iMAP: an integrated bioinformatics and visualization pipeline for microbiome data analysis

Related Article
Explore at:
htmlAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Teresia Buza; Triza Tonui; Francesca Stomeo; Christian Tiambo; Robab Katani; Megan Schilling; Beatus Lyimo; Paul Gwakisa; Isabella Cattadori; Joram Buza; Vivek Kapur
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Preprocessing report generated automatically by the iMAP to provide a summary of quality control of the reads. The iMAP pipeline automatically saved the output in the “reports” folder as “report2_read_preprocessing.html”. (HTML 3463 kb)

Search
Clear search
Close search
Google apps
Main menu