Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets from the American Gut Project (AGP) that were used by Ullmann et al. for the research project "Over-optimism in unsupervised microbiome analyis".
The datasets were published by McDonald et al.:
McDonald D, Hyde E, Debelius JW, Morton JT, Gonzalez A, Ackermann G,
et al. American gut: an open platform for citizen science microbiome research.
Msystems. 2018;3(3). DOI: https://doi.org/10.1128/mSystems.00031-18
We accessed the files otu_table_BODY_HABITAT_UBERON_feces_json.biom and metadata_BODY_HABITAT_UBERON_feces_.txt at ftp://ftp.microbio.me/AmericanGut/ag-2017-12-04.
The phyloseq object ag.genus.rds results from these files after preprocessing. (For the preprocessing code, see our Github repository.)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here we provide data and software for the work "Absence of enterotypes in the human gut microbiomes reanalyzed with non-linear dimensionality reduction methods".
Enterotypes of the human gut microbiome have been proposed to be a powerful prognostic tool to evaluate the correlation between lifestyle, nutrition, and disease. However, the number of enterotypes suggested in the literature ranged from two to four. The growth of available metagenome data and the use of exact, non-linear methods of data analysis challenges the very concept of clusters in the multidimensional space of bacterial microbiomes.
We demonstrate the presence of a lower-dimensional structure in the microbiome space, with high-dimensional data concentrated near a low-dimensional non-linear submanifold, but the absence of distinct and stable clusters that could represent enterotypes. This observation is robust with regard to diverse combinations of dimensionality reduction techniques and clustering algorithms.
We used 16S rRNA genotype data from the National Institutes of Health Human Microbiome Project (HMP) and American Gut Project (AGP) presented in Order, Family, and Genus taxonomic levels (O, F, and G, respectively). These largest open-access available datasets provide a sufficient number of data points for correct estimation of the clustering partition and constructing a manifold. We used 4587 HMP samples from stool and rectum body sites downloaded from https://portal.hmpdacc.org/ and 9511 samples from AGP downloaded from https://figshare.com/ as abundance matrices. For comparison with the original research, we also analyzed Sanger, Illumina, and Pyroseq datasets from (http://www.bork.embl.de/Docu/Arumugam_et_al_2011/).
All datasets are provided in the data.zip
and normalized by dividing the Operational Taxonomic Units (OTUs) values by the total sum of abundances for a given data sample.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets from the American Gut Project (AGP) that were used by Ullmann et al. for the research project "Over-optimism in unsupervised microbiome analyis".
The datasets were published by McDonald et al.:
McDonald D, Hyde E, Debelius JW, Morton JT, Gonzalez A, Ackermann G,
et al. American gut: an open platform for citizen science microbiome research.
Msystems. 2018;3(3). DOI: https://doi.org/10.1128/mSystems.00031-18
We accessed the files otu_table_BODY_HABITAT_UBERON_feces_json.biom and metadata_BODY_HABITAT_UBERON_feces_.txt at ftp://ftp.microbio.me/AmericanGut/ag-2017-12-04.
The phyloseq object ag.genus.rds results from these files after preprocessing. (For the preprocessing code, see our Github repository.)