Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genome binning of the gold standard pooled assembly. Refinement of the binning output of MaxBin 2.2.7, MetaBAT 2.12.1, CONCOCT 1.0.0, and DAS Tool 1.1.2.Software: DAS ToolSoftwareVersion: 1.1.2DataURL: https://data.cami-challenge.org/participateSoftwareURL: https://github.com/cmks/DAS_ToolDockerImage: cami/das_tool:1.1.2IsBiobox: NoShortReadsUsed: TrueLongReadsUsed: FalseCommandUsed: DAS_Tool -i binning_concoct1.0.0,binning_maxbin2.2.7,binning_metabat2.12.1 -c anonymous_gsa_pooled.fasta -o output --search_engine diamond
Software: CAMIARKQuikrSoftwareVersion: 1.0.0DataURL: https://data.cami-challenge.org/participateSoftwareURL: https://doi.org/10.5281/zenodo.1730572DockerImage: stefanjanssen/docker_profiling_tools:quickrIsBiobox: TrueBioboxYAMLFile: https://zenodo.org/record/3629567/files/biobox.yaml?download=1ReferenceDatabase: https://doi.org/10.5281/zenodo.1730572ShortReadsUsed: TrueLongReadsUsed: FalseCommandsUsed: docker run --volume="/path/to/19122017_mousegut_scaffolds_yaml:/bbx/mnt/yaml:ro" --volume="/path/to/19122017_mousegut_scaffolds:/bbx/mnt/input:ro" --volume="/path/to/output:/bbx/mnt/output:rw" --volume="/path/to/output/metadata:/bbx/metadata:rw" --volume="/path/to/output/cache:/cache:rw" \stefanjanssen/docker_profiling_tools:quickr
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Assembly of the first 10 short read samples Software: metaSPAdes SoftwareVersion: 3.13.0 DataURL: https://data.cami-challenge.org/participate SoftwareURL: https://github.com/ablab/spades ShortReadsUsed: True LongReadsUsed: False CommandUsed: conda create -n spades3130 spades=3.13.0-0
Sample0=/path/to/19122017_mousegut_scaffolds/2017.12.29_11.37.26_sample_0/reads/anonymous_reads.fq.gz Sample1=/path/to/19122017_mousegut_scaffolds/2017.12.29_11.37.26_sample_1/reads/anonymous_reads.fq.gz Sample2=/path/to/19122017_mousegut_scaffolds/2017.12.29_11.37.26_sample_2/reads/anonymous_reads.fq.gz Sample3=/path/to/19122017_mousegut_scaffolds/2017.12.29_11.37.26_sample_3/reads/anonymous_reads.fq.gz Sample4=/path/to/19122017_mousegut_scaffolds/2017.12.29_11.37.26_sample_4/reads/anonymous_reads.fq.gz Sample5=/path/to/19122017_mousegut_scaffolds/2017.12.29_11.37.26_sample_5/reads/anonymous_reads.fq.gz Sample6=/path/to/19122017_mousegut_scaffolds/2017.12.29_11.37.26_sample_6/reads/anonymous_reads.fq.gz Sample7=/path/to/19122017_mousegut_scaffolds/2017.12.29_11.37.26_sample_7/reads/anonymous_reads.fq.gz Sample8=/path/to/19122017_mousegut_scaffolds/2017.12.29_11.37.26_sample_8/reads/anonymous_reads.fq.gz Sample9=/path/to/19122017_mousegut_scaffolds/2017.12.29_11.37.26_sample_9/reads/anonymous_reads.fq.gz
cat $Sample0 $Sample1 $Sample2 $Sample3 $Sample4 $Sample5 $Sample6 $Sample7 $Sample8 $Sample9 > Samples0-9_anonymous_reads.fq.gz
conda activate spades3130 /usr/bin/time -v metaspades.py --12 Samples0-9_anonymous_reads.fq.gz -o metaSPAdes3130-Sample0-9
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genome binning of the gold standard pooled assembly Software: CONCOCTSoftwareVersion: 1.0.0DataURL: https://data.cami-challenge.org/participateSoftwareURL: https://github.com/BinPro/CONCOCTDockerImage: quay.io/biocontainers/concoct:1.0.0--py37h88e4a8a_5IsBiobox: NoShortReadsUsed: TrueLongReadsUsed: FalseCommandUsed: for i in {0..63}; do bowtie2 -q --threads 30 --fr -x anonymous_gsa_pooled.fasta --interleaved sample_${i}/anonymous_reads.fq -S anonymous_reads_sample_${i}.sam ; donefor i in {0..63}; do samtools view -b sample_${i}.sam -o anonymous_reads_sample_${i}.bam & donefor i in {0..63}; do samtools sort anonymous_reads_sample_${i}.bam -o anonymous_reads_sample_${i}.sorted.bam ; donefor i in {0..63}; do samtools index anonymous_reads_sample_${i}.sorted.bam ; donecut_up_fasta.py anonymous_gsa_pooled.fasta -c 10000 -o 0 --merge_last -b contigs_10K.bed > contigs_10K.faconcoct_coverage_table.py contigs_10K.bed /host/benchmarking/fmeyer/output/bowtie2/mouse_gut/sorted_bam/anonymous_reads_sample_*.sorted.bam > coverage_table.tsvconcoct --composition_file contigs_10K.fa --coverage_file coverage_table.tsv -bmerge_cutup_clustering.py clustering_gt1000.csv > clustering_merged.csv
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genome coverage (short reads) averaged over the 64 samples of the CAMI 2 Mouse Gut Toy data set
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genome binning of the gold standard pooled assembly Software: MaxBinSoftwareVersion: 2.2.7DataURL: https://data.cami-challenge.org/participateSoftwareURL: https://sourceforge.net/projects/maxbin/DockerImage: cami/maxbin:2.2.7IsBiobox: NoShortReadsUsed: TrueLongReadsUsed: FalseCommandUsed: run_MaxBin.pl -thread 16 -contig anonymous_gsa_pooled.fasta -out output -reads sample_0/reads/anonymous_reads.fq -reads2 sample_1/reads/anonymous_reads.fq -reads3 sample_2/reads/anonymous_reads.fq -reads4 sample_3/reads/anonymous_reads.fq -reads5 sample_4/reads/anonymous_reads.fq -reads6 sample_5/reads/anonymous_reads.fq -reads7 sample_6/reads/anonymous_reads.fq -reads8 sample_7/reads/anonymous_reads.fq -reads9 sample_8/reads/anonymous_reads.fq -reads10 sample_9/reads/anonymous_reads.fq -reads11 sample_10/reads/anonymous_reads.fq -reads12 sample_11/reads/anonymous_reads.fq -reads13 sample_12/reads/anonymous_reads.fq -reads14 sample_13/reads/anonymous_reads.fq -reads15 sample_14/reads/anonymous_reads.fq -reads16 sample_15/reads/anonymous_reads.fq -reads17 sample_16/reads/anonymous_reads.fq -reads18 sample_17/reads/anonymous_reads.fq -reads19 sample_18/reads/anonymous_reads.fq -reads20 sample_19/reads/anonymous_reads.fq -reads21 sample_20/reads/anonymous_reads.fq -reads22 sample_21/reads/anonymous_reads.fq -reads23 sample_22/reads/anonymous_reads.fq -reads24 sample_23/reads/anonymous_reads.fq -reads25 sample_24/reads/anonymous_reads.fq -reads26 sample_25/reads/anonymous_reads.fq -reads27 sample_26/reads/anonymous_reads.fq -reads28 sample_27/reads/anonymous_reads.fq -reads29 sample_28/reads/anonymous_reads.fq -reads30 sample_29/reads/anonymous_reads.fq -reads31 sample_30/reads/anonymous_reads.fq -reads32 sample_31/reads/anonymous_reads.fq -reads33 sample_32/reads/anonymous_reads.fq -reads34 sample_33/reads/anonymous_reads.fq -reads35 sample_34/reads/anonymous_reads.fq -reads36 sample_35/reads/anonymous_reads.fq -reads37 sample_36/reads/anonymous_reads.fq -reads38 sample_37/reads/anonymous_reads.fq -reads39 sample_38/reads/anonymous_reads.fq -reads40 sample_39/reads/anonymous_reads.fq -reads41 sample_40/reads/anonymous_reads.fq -reads42 sample_41/reads/anonymous_reads.fq -reads43 sample_42/reads/anonymous_reads.fq -reads44 sample_43/reads/anonymous_reads.fq -reads45 sample_44/reads/anonymous_reads.fq -reads46 sample_45/reads/anonymous_reads.fq -reads47 sample_46/reads/anonymous_reads.fq -reads48 sample_47/reads/anonymous_reads.fq -reads49 sample_48/reads/anonymous_reads.fq -reads50 sample_49/reads/anonymous_reads.fq -reads51 sample_50/reads/anonymous_reads.fq -reads52 sample_51/reads/anonymous_reads.fq -reads53 sample_52/reads/anonymous_reads.fq -reads54 sample_53/reads/anonymous_reads.fq -reads55 sample_54/reads/anonymous_reads.fq -reads56 sample_55/reads/anonymous_reads.fq -reads57 sample_56/reads/anonymous_reads.fq -reads58 sample_57/reads/anonymous_reads.fq -reads59 sample_58/reads/anonymous_reads.fq -reads60 sample_59/reads/anonymous_reads.fq -reads61 sample_60/reads/anonymous_reads.fq -reads62 sample_61/reads/anonymous_reads.fq -reads63 sample_62/reads/anonymous_reads.fq -reads64 sample_63/reads/anonymous_reads.fq
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Taxonomic binning of the gold standard pooled assembly
Software: PhyloPythiaS+
SoftwareVersion: 1.4
DataURL: https://data.cami-challenge.org/participate
SoftwareURL: https://github.com/algbioi/ppsp
DockerImage: cami/ppsp:1.4
IsBiobox: False
ReferenceDatabase: RefSeq 93, SILVA 132
Taxonomy: NCBI 2018-02-26
ShortReadsUsed: False
LongReadsUsed: False
CommandUsed: run_ppsp.py --pipelineDir ppsp_pipepline --inputFastaFile anonymous_gsa_pooled.fasta --databaseFile ncbi_taxonomy --refSeq refseq93 --s16Database SILVA_132 --mgDatabase reference_NCBI201502/mg5
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 1: Table S1. Binning results for CAMI-high datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Software: MetaPhlAn
SoftwareVersion: 2.9.21
DataURL: https://data.cami-challenge.org/participate
SoftwareURL: https://bitbucket.org/biobakery/metaphlan2
DockerImage: cami/metaphlan:2.9.21
IsBiobox: True
BioboxYAMLFile: https://zenodo.org/record/3629567/files/biobox.yaml?download=1
ReferenceDatabase: mpa_v29_CHOCOPhlAn_201901
ShortReadsUsed: True
LongReadsUsed: False
CommandsUsed: docker run <br>--volume="/path/to/19122017_mousegut_scaffolds_yaml:/bbx/mnt/yaml:ro" <br>--volume="/path/to/19122017_mousegut_scaffolds:/bbx/mnt/input:ro" <br>--volume="/path/to/output:/bbx/mnt/output:rw" <br>--volume="/path/to/output/metadata:/bbx/metadata:rw" <br>--volume="/path/to/output/cache:/cache:rw" <br>--volume="/path/to/reference_database:/exchange/db:rw" <br>cami/metaphlan:2.9.21
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genome binning of viral entities from bulk metagenomics data
Authors
Joachim Johansen1,2, Damian R. Plichta2, Jakob Nybo Nissen1,3, Marie Louise Jespersen1,4, Shiraz A. Shah5, Ling Deng6, Jakob Stokholm5,6, Hans Bisgaard5, Dennis Sandris Nielsen6, Søren Sørensen7, Simon Rasmussen1
Affiliations
1 Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen N, Denmark
2 Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
3 Statens Serum Institut, Viral & Microbial Special diagnostics, Copenhagen, Denmark
4 National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
5 Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
6 Section of Food Microbiology and Fermentation, Department of Food Science, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
7 Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
Methods description
We compared the viral binning performance of VAMB and MetaBAT2 using the official CAMI consortium method to create assemblies and metagenome profiles. To this end we generated 3 different metagenome compositions with up to 308 reference genomes; one mixed with bacteria, plasmids and viruses to test binning in complex samples i.e. high diversity (1), one with only crass-like viruses to test binning with highly similar viruses i.e. high relatedness (2) and a set of small-viruses (<6,000 bp) including members of the Microviridae family to address the bias of size (3). Bacterial genomes were gathered from NCBIs refseq genome repository 2021, plasmids from the PLSDB database (v. 2021_06_23) and viral genomes from the recent MGV database.
Dataset A contained a mixture of bacteria (N=8), plasmids (N=20) and viruses (N=280) to test binning in complex samples, i.e. high diversity. Dataset B contained only crass-like viruses (N=80) to test binning with highly similar viruses i.e. high relatedness. Dataset C contained small-viruses (N=50, <6,000 bp) of the Microviridae family to address the bias of size. Bacterial genomes were sampled from the Refseq genome repository 2021, plasmids from the PLSDB database and viral genomes from the recent MGV database (Nayfach, et al. Nature Microbiology 2021).
Reconstructing the genomes of microbial community members is key to the interpretation of shotgun metagenome samples. Genome binning programs deconvolute reads or assembled contigs of such samples into individual bins, but assessing their quality is difficult due to the lack of evaluation software and standardized metrics. We present AMBER, an evaluation package for the comparative assessment of genome reconstructions from metagenome benchmark data sets. It calculates the performance metrics and comparative visualizations used in the first benchmarking challenge of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). As an application, we show the outputs of AMBER for eleven different binning programs on two CAMI benchmark data sets. AMBER is implemented in Python and available under the Apache 2.0 license on GitHub.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a mock dataset of 120 000 artificial contigs of 1 kb length derived by simulating reads from 295 unique genomes and 44 species with each two or three strain genomes using the ART read simulator (Huang et al., 2012) and a lognormal abundance distribution. Genomes were chosen according to the CAMI2015 (www.cami-challenge.org) medium complexity toy dataset. The dataset contains four replicate samples with varied abundances and corresponding sequence feature files in MGLEX v0.1.1 format to use for genome reconstruction. Our aim was to create a benchmark dataset under controlled settings, minimizing potential biases introduced by specific software. This package also includes MGLEX benchmark scripts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 6: Table S6. Evaluation results on chicken gut metagenomic datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CAMI Airways dataset from the toy Human Microbiome Project dataset of the second Critical Assessment of Metagenomic Interpretation. Contains the following files.- Contigs file- Paths file (metaSPAdes)- Assembly graph file (metaSPAdes)- Abundance file- Binning results (including CheckM results)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Software: mOTUs
SoftwareVersion: 2.5.1
DataURL: https://data.cami-challenge.org/participate
SoftwareURL: https://motu-tool.org/
DockerImage: cami/motus:2.5.1
IsBiobox: False
ReferenceDatabase: mOTUs database version 2.5.0
ShortReadsUsed: True
LongReadsUsed: False
CommandsUsed: for i in {0..63}; do motus profile -f sample_$((i))/reads/anonymous_reads_r1.fq -r sample_$((i))/reads/anonymous_reads_r2.fq -n $((i)) -C precision > sample$((i)).profile ; done
cat sample*.profile > cami2_mouse_gut_motus2.5.1.profile
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CAMI GI dataset from the toy Human Microbiome Project dataset of the second Critical Assessment of Metagenomic Interpretation. Contains the following files.- Contigs file- Paths file (metaSPAdes)- Assembly graph file (metaSPAdes)- Abundance file- Binning results (including CheckM results)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Multisample dataset and results for the CAMI datasets. Contains the following files.- Contigs files- Paths files (metaSPAdes)- Assembly graph files (metaSPAdes)- Abundance files- Binning results (including CheckM results)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CAMI binning summary table. The number of bins recovered at different quality thresholds (determined with AMBER) from the CAMI challenge with original binning software (metaBAT2, MaxBin2, CONCOCT) and software consolidating the original sets (DAS_Tool, Binning_refiner, metaWRAP). MetaWRAP was run with default parameters. Performance is shown for “unique strain” (ANI 95% to another genome) genomes. (XLSX 39 kb)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional File with interactive charts for all CAMI toy set results on default, very-precise and very-sensitive mode. File prefix S, M, and H for low, medium and high complexity, respectively. (TAR 3573 kb)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genome binning of the gold standard pooled assembly. Refinement of the binning output of MaxBin 2.2.7, MetaBAT 2.12.1, CONCOCT 1.0.0, and DAS Tool 1.1.2.Software: DAS ToolSoftwareVersion: 1.1.2DataURL: https://data.cami-challenge.org/participateSoftwareURL: https://github.com/cmks/DAS_ToolDockerImage: cami/das_tool:1.1.2IsBiobox: NoShortReadsUsed: TrueLongReadsUsed: FalseCommandUsed: DAS_Tool -i binning_concoct1.0.0,binning_maxbin2.2.7,binning_metabat2.12.1 -c anonymous_gsa_pooled.fasta -o output --search_engine diamond