Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Proportion of counts assigned to either true or spurious OTUs/ASVs.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Factors such as host species, phylogeny, diet, timing, and location of sampling are thought to influence the composition of gut-associated bacteria in insects. In this study, we compared the faecal-associated bacterial taxa for three Coenagrion and one Enallagma damselfly species. We expected high overlap in representation of bacterial taxa due to the shared ecology and diet of these species. Using metabarcoding based on the 16S rRNA gene, we identified 1513 sequence variants, representing distinct bacterial ‘taxa’. Intriguingly, the damselfly species showed somewhat different magnitudes of richness of ZOTUs, ranging from 480 to 914 ZOTUs. In total, 921 (or 60.8% of the 1513) distinct ZOTUs were non-shared, each found only in one species, and then most often in only a single individual. There was a surfeit of these non-shared incidental ZOTUs in the Enallagma species accounting for it showing the highest bacterial richness and accounting for a sample-wide pattern of more single-species ZOTUs than expected, based on comparisons to the null model. Future studies should address the extent to which faecal bacteria represent non-incidental gut bacteria and whether abundant and shared taxa are true gut symbionts.
Methods To assess the faecal bacterial assemblages of damselflies, we targeted four predatory odonate species at a freshwater pond of approximately 600 m × 200 m (12 ha), located in Southern Finland (ETRS-TM35FIN N: 67118; E: 2460). On 1–2 June 2016, we collected 185 individuals (20–26 males and females from each species) for faecal DNA analysis. All our focal damselfly species belong to the family Coenagrionidae: Coenagrion lunulatum (Charpentier, 1840), Coenagrion hastulatum (Charpentier, 1825), Coenagrion pulchellum (Vander Linden, 1825), and Enallagma cyathigerum (Charpentier, 1840). Species identification of damselflies was based on current literature, e.g. [27]. These four target species were selected as they were the most common predatory species at the study site, based on pilot surveys (K. Kaunisto, pers. obs.). Only sexually mature individuals with adult colours and hardened wings were included in the study. According to a previous study [28], all four focal species feed mainly on dipteran prey by open foraging flights and by gleaning insects from vegetation.Each damselfly was placed into a sterile 10-ml collection tube housing a piece of dampened paper towel to reduce desiccation risk. To allow for defecation, damselflies were kept in the tubes for the next 24 h (sufficient time for defecation to occur, according to [18]). After the live individuals had defecated into the tube, we froze the entire sample without removing the faeces or the damselfly. All faecal material was collected from the tubes with sterile forceps, after which the faeces were frozen in 15-ml Falcon tubes at −64 °C until further processing and analysis. Sample Processing and Molecular Analysis Total DNA was extracted as described in a previous study using NucleoSpin Tissue XS Kit (product nr 740901, Macherey-Nagel, Düren, Germany) [28]. To characterize the bacterial assemblages of the focal species, we used established metabarcoding protocols for dragonflies building on earlier optimization [1828]. To amplify bacterial 16S rRNA gene (hypervariable region v4), we used primers 515F-Parada (also known as 515FB: 5′-GTG YCA GCM GCC GCG GTA A-3′; Parada et al. 2016) and 806R-Apprill (also known as 806RB: 5′-GGA CTA CNV GGG TWT CTA AT-3′; [29]). Each DNA sample was amplified in two separate reactions that were individually tagged and sequenced. The locus-specific PCR setup followed Kankaanpaa et al. [30] and included 5 μl of 2× MyTaq HS Red Mix (Bioline, UK), 2.4 μl of H2O, 150 nM of each primer (two forward and two reverse primer versions; total primer mix concentration 600 nM), and 2 μl of DNA extract per each sample in 10 μl volume. CycAQ6ling conditions were 3 min at 95 °C, then 35 cycles of 45 s at 95 °C, 1 min at 50 °C, and 1 min 30 s at 72 °C, ending with 10 min at 72 °C. In the second PCR stage, the first PCR products were modified by attaching Illumina-specific adapters and sample-specific indices. For a reaction volume of 10 μl in the indexing PCR, we mixed 5 μl of MyTaq HS RedMix, 500 nM of each tagged and indexed primer (i7 and i5), and 3 μl of locus-specific PCR product from the first PCR phase. For this second PCR, we used the following protocol: initial denaturation for 3 min at 98 °C, then 15 cycles of 20 s at 95 °C, 15 s at 60 °C, and 30 s at 72 °C, followed by 3 min at 72 °C. All the indexed reactions were then pooled and purified using magnetic beads [3132].Sequencing was done on an Illumina MiSeq v3 PE 2×300 (Illumina Inc., San Diego, CA, USA) run, including the PhiX control library by the Turku Centre for Biotechnology, Turku, Finland. After sequencing, the reads were demultiplexed into each original sample and uploaded onto CSC servers (IT Center for Science, https://www.csc.fi/ ) for bioinformatic analysis. Paired-end reads (13,027,754) were merged and trimmed for quality using 64-bit vsearch version 2.14.2 [33] command ‘fastq_mergepairs’ with the default options and ‘fastq_allowmergestagger’. Primers were removed from the merged reads (11,179,018) using software cutadapt version 1.14 (Martin 2011) with 20% mismatch rate, minimum length of 240 bp and truncate length of 270 bp (the excess nucleotides were trimmed from 3′ end). Trimmed reads (11,050,385) reads were then collapsed into unique sequences (singletons removed) with command ‘fastx_uniques’ and option ‘minuniquesize’ set to 10 (49,832 uniques retrieved). Finally, reads were corrected for point errors to obtain an accurate set of amplicon sequences (=denoised) and filtered of chimeric amplicons (=chimeras were removed) resulting in 3803 ZOTUs (‘ZOTU’, ‘zero-radius OTU’) through command ‘unoise3’ using USEARCH version 11.0.667 with settings minsize = 8 and unoise_alpha = 2. The median and mean length of ZOTUs was 253 bp (SD ± 2.50 bp) Then ZOTUs were mapped back to the original trimmed reads with command ‘usearch_global’ to establish the total number of reads in each sample using vsearch. We were able to map 10,627,197 of 11,050,385 (96.17%) to our original samples. The ZOTUs (sequence variants) were assigned to taxa using 16 RDP database with SINTAX (Edgar, 2010) probabilistic algorithm implemented in vsearch. The database ‘16S RDP training set v18’ (21k seqs) was downloaded from the usearch website (https://drive5.com/usearch/manual/sintax_downloads.html; accessed 19th April 2023). For the chosen database, the genus level is the lowest taxonomic level. For any taxonomic level, we only accepted assignations with 100% probability. The data was further filtered to remove artefacts, spurious reads, and non-targets based on information on the numerous control samples, technical replicates, and taxonomy. First, we removed those ZOTUs from any sample that had fewer reads than extraction or PCR controls (9,833,618 reads retained). Then, we collapsed reads based on the taxonomy per each sample, that is, all the reads that were assigned to the same taxa per sample were summarized. Out of the 3803 ZOTUs, we identified 983 to genus, 1570 to family, 2002 to order, 3063 to class, 3319 to phylum, and 3482 to domain level. From the total ~10M reads, we identified 4.0M to genus, 4.4M to family, 8.5M to order, and 9.5M to the higher levels. Then, we removed taxa that were present in a sample by only one of the two replicates and finally summed the reads in both replicates (9,678,663 reads left). Then, to remove potentially leaked ‘tag-jumped’ reads from the data, we removed all taxa from the samples with less than 0.05% proportion of the total reads in one sample (9,636,233 reads saved). We removed all the taxa outside domains Bacteria or Archaea, as well as Class Chloroplast (9,006,117 reads passed the filtering). The non-targets included mainly plants (~6200 reads) and Fungi (~250 reads). Altogether 284,351 reads could not be assigned with the strict 100% probability threshold. Finally, very rare occurrences (sequence count < 20) were removed (9,004,996 final reads).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Expected ratios (based on known copy numbers of the respective 16S rRNA gene variants) are shown in bold. USEARCH-UNOISE3 could not differentiate the two C. beijerinckii variants. Qiime2-Deblur could not differentiate any of the variants.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Multidrug and toxic compound extrusion (MATE) transporters comprise a multigene family that mediates multiple functions in plants through the efflux of diverse substrates including organic molecules, specialized metabolites, hormones, and xenobiotics. MATE classification based on genome-wide studies remains ambiguous, likely due to a lack of large-scale phylogenomic studies and/or reference sequence datasets. To resolve this, we established a phylogeny of the plant MATE gene family using a comprehensive kingdom-wide phylogenomic analysis of 74 diverse plant species. We identified more than 4,000 MATEs, which were classified into 14 subgroups based on a systematic bioinformatics pipeline using USEARCH, blast+ and synteny network tools. Our classification was performed using a four-step process, whereby MATEs sharing ≥ 60% protein sequence identity with a ≤ 1E-05 threshold at different sequence lengths (either full-length, ≥ 60% length, or ≥ 150 amino acids) or retaining in the similar synteny blocks were assigned to the same subgroup. In this way, we assigned subgroups to 95.8% of the identified MATEs, which we substantiated using synteny network clustering analysis. The subgroups were clustered under four major phylogenetic groups and named according to their clockwise appearance within each group. We then generated a reference sequence dataset, the usefulness of which was demonstrated in the classification of MATEs in additional species not included in the original analysis. Approximately 74% of the plant MATEs exhibited synteny relationships with angiosperm-wide or lineage-, order/family-, and species-specific conservation. Most subgroups evolved independently, and their distinct evolutionary trends were likely associated with the development of functional novelties or the maintenance of conserved functions. Together with the systematic classification and synteny network profiling analyses, we identified all the major evolutionary events experienced by the MATE gene family in plants. We believe that our findings and the reference dataset provide a valuable resource to guide future functional studies aiming to explore the key roles of MATEs in different aspects of plant physiology. Our classification framework can also be readily extendable to other (super) families.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The two FASTA files contain multiple sequence alignments of sensor histidine kinase and response regulator sequences. The source sequences were obtained by BLAST, clustered with usearch and aligned with muscle. More details to be found in Multamäki et al. 2021.
Raw sequence data can be handled with a fasta editor. OTU tables can be opened using Microsoft Excel.
Hierarchical clustering. Taxonomic assignment of reads was performed using a preexisting database of SSU rDNA sequences from including XXX reference sequences generated by Sanger sequencing. Experimental amplicons (reads), sorted by abundance, were then concatenated with the reference extracted sequences sorted by decreasing length. All sequences, experimental and referential, were then clustered to 85% identity using the global alignment clustering option of the uclust module from the usearch v4.0 software (Edgar, 2010). Each 85% cluster was then reclustered at a higher stringency level (86%) and so on (87%, 88%,…) in a hierarchical manner up to 100% similarity. Each experimental sequence was then identified by the list of clusters to which it belonged at 85% to 100% levels. This information can be viewed as a matrix with the lines corresponding to different sequences and the columns corresponding to the cluster membership at each clustering level. Taxonomic assignment for a given read was performed by first looking if reference sequences clustered with the experimental sequence at the 100% clustering level. If this was the case, the last common taxonomic name of the reference sequence(s) within the cluster was used to assign the environmental read. If not, the same procedure was applied to clusters from 99% to 85% similarity if necessary, until a cluster was found containing both the experimental read and reference sequence(s), in which case sequences were taxonomically assigned as described above.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Multidrug and toxic compound extrusion (MATE) transporters comprise a multigene family that mediates multiple functions in plants through the efflux of diverse substrates including organic molecules, specialized metabolites, hormones, and xenobiotics. MATE classification based on genome-wide studies remains ambiguous, likely due to a lack of large-scale phylogenomic studies and/or reference sequence datasets. To resolve this, we established a phylogeny of the plant MATE gene family using a comprehensive kingdom-wide phylogenomic analysis of 74 diverse plant species. We identified more than 4,000 MATEs, which were classified into 14 subgroups based on a systematic bioinformatics pipeline using USEARCH, blast+ and synteny network tools. Our classification was performed using a four-step process, whereby MATEs sharing ≥ 60% protein sequence identity with a ≤ 1E-05 threshold at different sequence lengths (either full-length, ≥ 60% length, or ≥ 150 amino acids) or retaining in the similar synteny blocks were assigned to the same subgroup. In this way, we assigned subgroups to 95.8% of the identified MATEs, which we substantiated using synteny network clustering analysis. The subgroups were clustered under four major phylogenetic groups and named according to their clockwise appearance within each group. We then generated a reference sequence dataset, the usefulness of which was demonstrated in the classification of MATEs in additional species not included in the original analysis. Approximately 74% of the plant MATEs exhibited synteny relationships with angiosperm-wide or lineage-, order/family-, and species-specific conservation. Most subgroups evolved independently, and their distinct evolutionary trends were likely associated with the development of functional novelties or the maintenance of conserved functions. Together with the systematic classification and synteny network profiling analyses, we identified all the major evolutionary events experienced by the MATE gene family in plants. We believe that our findings and the reference dataset provide a valuable resource to guide future functional studies aiming to explore the key roles of MATEs in different aspects of plant physiology. Our classification framework can also be readily extendable to other (super) families.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Proportion of counts assigned to either true or spurious OTUs/ASVs.