Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of 572 barcode sequences used in statistical analyses. The region targeted is noted, along with the number of reads recovered for that sequence in each sample (n, 50). Corresponding scientific paper: Boggs, L.M.; Scheible, M.K.; Machado, G.; Meiklejohn, K.A. Single Fragment or Bulk Soil DNA Metabarcoding: Which is Better for Characterizing Biological Taxa Found in Surface Soils for Sample Separation? Genes 2019, 10, 431
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Aligned DNA sequence matrix for phylogenetic analyses of the article "A bizarre new species of Lynchius (Amphibia, Anura, Strabomantidae) from the Andes of Ecuador and first report of Lynchius parkeri in Ecuador"
The matrix is in NEXUS format. Genes are arranged as follows:
RAG1: 1-652 Tyrosinase: 653-1195 12S RNA: 1196-2242 tRNA Val: 2243-2313 16S RNA: 2314-3994 tRNA Leu = 3995-4065 ND1: 4066-5026 tRNA Ile: 5027-5144;
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The file pan_matrix.txt is a huge table (tab-separated columns) where each row corresponds to a genome and each column to a domain sequences family. The rows are named by the BIOID-code, see map_ecoli.txt to look up the strain names. The columns are named Cluster 1, Cluster 2,...etc. The corresponding Pfam-A domain sequence is given in the file cluster_info.txt (see below). In cell (i,j) in this table you find the number of occurrences that domain sequence j has in genome number i.
Monotropoideae (Ericaceae) is a wholly leafless and holomycotrophic group of primarily temperate herbs with centers of diversity in western North America and east Asia.   The eleven genera are structurally diverse and also vegetatively reduced, making relationships difficult to assess based on morphology. Previous molecular analyses have focused primarily on segments of the ribosomal RNA repeat and yielded sometimes conflicting topologies. We employed a genomic sampling approach to obtain 102 nuclear loci and plastid coding loci for nine of the genera, as well as sampling ITS-26S and plastid rps2 for a broader set of accessions via PCR and Sanger sequencing Data filtering for character completeness had a clear effect on relationships and branch support. Nuclear and plastid loci agree on a topology that resolves Allotropa and Hemitomes as sisters and Monotropsis sister to Eremotropa+Monotropa+Monotropastrum, relationships that were unclear from previous analyses. Hypopitys should be ..., Data were collected using Illumina sequencing and a low-coverage genome skimming approach., , # Monotropoid Ericaceae 102-locus nuclear sequences matrices and plastid locus sequence matrix
https://doi.org/10.5061/dryad.7h44j1017
These matrices were derived from genome-skimming runs using Illumina sequencing technology. After the pools of reads were obtained, they were mapped to Angiosperm353 target sequences from Monotropa uniflora, which were obtained from the supplementary data in the paper that described that probe set (Johnson et al., 2019, Systematic Biology 68: 594-606). This allowed us to recover sequences from our reads that matched the Angiosperm353 orthologs. 102 of these were assembled into a concatenated dataset for analysis of monotropoid relationships and filtered to different levels of individual base-position completeness: 100% complete, 80% complete, 50% complete, and unfiltered (all data included). These matrices are provided in NEXUS format.
We also assembled plastid genomes from the skimming reads and de novo mapp...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Aligned DNA sequence matrix for phylogenetic analyses in the article "Description and phylogenetic relationships of a new trans-Andean species of Elachistocleis Parker 1927 (Amphibia, Anura, Microhylidae)"
Gene partitions are arranged as follows (tRNAs are included as part of larger adjacent genes):
16S = 1-1165;
BDNFcodonPos1 = 1166 - 1874\3;
BDNFcodonPos2 = 1167 - 1875\3;
BDNFcodonPos3 = 1168 - 1876\3;
cmyccodonPos1 = 1878 - 2319\3;
cmyccodonPos2 = 1879 - 2320\3;
cmyccodonPos3 = 1877 - 2318\3;
CO1codonPos1 = 2321 - 2981\3;
CO1codonPos2 = 2322 - 2979\3;
CO1codonPos3 = 2323 - 2980\3;
histcodonPos1 = 2983 - 3307\3;
histcodonPos2 = 2984 - 3308\3;
histcodonPos3 = 2982 - 3309\3;
siacodonPos1 = 3311 - 3704\3;
siacodonPos2 = 3312 - 3705\3;
siacodonPos3 = 3310 - 3706\3;
tyrcodonPos1 = 3708 - 4263\3;
tyrcodonPos2 = 3709 - 4264\3;
tyrcodonPos3 = 3707 - 4262\3;
28S = 4265-5084;
12S = 5085-6171;
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the DNA sequence alignment for the 1083 OTUs used in Thornhill et al. 2017. We defined monophyletic OTUs at the finest scale possible given data availability and current understanding of the evolutionary relationships of Californian plant lineages. Using all 5258 species described in The Jepson eFlora (http://ucjeps.berkeley.edu/eflora/) as a starting point, a thorough literature search was undertaken to find molecular phylogenetic studies that had included California taxa. Genera were split into finer level OTUs if robust evidence existed for monophyly of subclades and representative DNA data either were available in GenBank or could be generated within the scope of the project. Genera were lumped in a few cases if recent evidence showed that one is nested in another. In total, 1083 OTUs were defined to include the 5258 binomials (Table S2 in Thornhill et al. 2017 details the OTU to which each binomial was assigned).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Aligned DNA sequence matrix for phylogenetic analyses of the article "A new glassfrog of the genus Centrolene (Amphibia: Centrolenidae) from the Subandean Kutukú Cordillera, eastern Ecuador"
The matrix is in NEXUS format and has 6626 bp and 239 terminals.
Partitions are as follows:
https://uow.libguides.com/uow-ro-copyright-all-rights-reservedhttps://uow.libguides.com/uow-ro-copyright-all-rights-reserved
Hadamard mathematical matrices for weighing.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A RAD sequence matrix in phylip format of 17 Carabus species (89 individual samples), assembled with pyRAD. pyRAD parameters: sequence similarity=70% and minimum number of taxa=45.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ascii pssm file made from MSA-A using PSI-BLAST. (ASCII 5 kb)
Molecular phylogenetic research has relied on the analysis of the coding sequences by genes or of the amino acid sequences by the encoded proteins. Enumerating the numbers of mismatches, being indicators of mutation, has been central to pertinent algorithms. However, the constraining forces of selection and self-organization have been unaccounted for in conventional approaches, possibly causing available models to fall short of representing the actual evolutionary history. Specific amino acids possess quantifiable characteristics that enable the conversion from “words†(strings of letters denoting amino acids or bases) to “waves†(strings of quantitative values representing the physico-chemical properties) or to matrices (coordinates representing the positions in a comprehensive property space). The application of such numerical representations to evolutionary analysis takes into account not only mutation but also selection/self-organization as influences that drive speciation, because ..., , , # Beyond Mutations: Accounting for Selection and Self-Organization in the Analysis of Protein Evolution
https://doi.org/10.5061/dryad.tht76hf63
Publicly accessible sequences were collected from the NCBI landmark model organisms and then sought to add representatives of diverse clades from NCBI nucleotide.
Data was derived from the following sources:
NA
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sequence sets, alignments and tree files associated with Lawley, J.W., Gamero-Mora, E., Maronna, M.M., Chiaverano, L.M., Stampar, S.N., Hopcroft, R.R., Collins, A.G., Morandini, A.C. 2021. The importance of molecular characters when morphological variability hinders diagnosability: systematics of the moon jellyfish genus Aurelia (Cnidaria: Scyphozoa). PeerJ https://doi.org/10.7717/peerj.11954.See below a description of the attached files. When applicable, just replace the word (Marker) for 16S, COI, ITS1 or 28S (for COI files, disregard the "-ip_ia1" that appears in parentheses in filenames). For relevant codes used for molecular analyses based on these files see github.com/lawleyjw/Aurelia.- (Marker)-Aurelia-seqs.fasta - Sequence set used to generate (Marker)-Aurelia(-ip_ia1).fasta. Sequence IDs normally appear as ">(isolate/acession)-(previous species ID)-(sampling locality)".- (Marker)-Aurelia-seqs-UPDATED.fasta - Same sequence set as above, but sequence IDs appear with updated species names as ">(accession)-(updated species ID)|(previous sequence ID, as in (Marker)-Aurelia-seqs.fasta above)".- (Marker)-Aurelia(-ip_ia1).fasta - Alignment used to generate (Marker)-Aurelia.tre. Sequence IDs normally appear as ">(isolate/acession)-(previous species ID)-(sampling locality)".- (Marker)-Aurelia(-ip_ia1)-UPDATED.fasta - Same alignment as above, but sequence IDs appear with updated species names as ">(accession)-(updated species ID)|(previous sequence ID, as in (Marker)-Aurelia(-ip_ia1).fasta above)".- (Marker)-Aurelia.tre - Parsimony tree file in Newick format with branch lengths, derived from (Marker)-Aurelia(-ip_ia1).fasta; for Goodman-Bremer support values and bootstrap resampling frequencies see Fig. S4-S7 in Lawley et al. (2021).- concat-Aurelia.nex - Concatenated sequence matrix in Nexus file (from Sequence Matrix), including sequences of all markers (based on single-marker alignments) for some representative specimens of each species. This file was used to generate concat-Aurelia.tre and concatML-Aurelia.tre. For details on species composition and species ID see Table S5 in Lawley et al. (2021).- concat-Aurelia.tre - Concatenated parsimony tree file in Newick format with branch lengths, derived from concat-Aurelia.nex. For Goodman-Bremer support values and bootstrap resampling frequencies see Fig. 9 in Lawley et al. (2021).- concatML-Aurelia.tre - Concatenated maximum likelihood tree file in Nexus format (from FigTree), derived from concat-Aurelia.nex, including SH-aLRT and ultrafast bootstrap values, respectively (see Fig. S3 in Lawley et al., 2021).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We combined the COI sequence data with legacy multigene sequence data to create a new, taxon-rich phylogeny for the Amaurobioidinae. We used sequences for four loci that have been used in previous studies on the subfamily: two mitochondrial loci, COI (658bp) and ribosomal subunit 16S (16S, 410bp); and two nuclear loci, Histone H3 (H3, 327bp) and ribosomal subunit 28S (28S, 839bp). We complemented the Amaurobioidinae data with sequences from several non-amaurobioidine anyphaenids and two clubionids as outgroups. Sequence alignment was performed using the MAFFT (ver. 7.308) plugin in Geneious, allowing MAFFT to automatically select an appropriate alignment strategy based on the properties of each locus, or with the online MAFFT server (https://mafft.cbrc.jp), which consistently selected the L-INS-i algorithm. Finally, alignments of the four loci were concatenated to construct a 2234 bp multigene sequence matrix containing 692 taxa, with about 55% missing/gap data (“full” matrix henceforth). To ensure that excessive missing data did not affect the resulting topology, we also constructed a reduced matrix by removing additional COI-only specimens so that each species and morphotype was represented by just one or two specimens for which all loci were available (where possible). After realignment, this reduced matrix was 2235 bp long, included 167 taxa, and had about 22% missing/gap data (“reduced” matrix henceforth). Phylogenetic analyses under maximum likelihood, including model selection, were then conducted with IQ-TREE 2. We performed phylogenetic analyses on both concatenated matrices (the full matrix and the reduced matrix) and on each individual locus. For model selection, we provided an initial scheme that partitioned the matrix by locus, and further partitioned the protein-coding loci (COI and H3) by codon position. We used ModelFinder and searched for the best partition scheme, all in IQ-TREE. The best models (partitions) for the full dataset were: GTR+F+I+G4 (16S), GTR+F+I+I+R4 (28S), TVM+F+I+I+R2 (COI-1), TIM2+F+R4 (COI-2), GTR+F+R5 (COI-3), TVMe+G4 (H3-1-H3-2), SYM+G4 (H3-3); and for the reduced dataset: GTR+F+I+G4 (16S), GTR+F+I+G4: (28S), GTR+F+I+G4: (COI-2), GTR+F+I+G4: (COI-3), TVM+F+I+G4: (COI-1, H3-2), GTR+F+I+G4: (H3-1), GTR+F+I+G4: (H3-3). For each dataset, once the best models and partitions were defined, we executed 10 independent replicates of tree calculations followed by 1000 ultrafast bootstrap replicates, and the replicate reaching the maximum likelihood was chosen. Phylogenetic analyses under parsimony were made with TNT, under equal weights, using the “new technology” search with default values, asking for 10 independent hits to the minimal length, and submitting the resulting trees to a round of TBR branch swapping.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We have deposited data and results files that support the molecular phylogenetic analyses presented in the study. Raw Illumina reads and contigs representing UCE loci have been deposited at the NCBI Sequence Read Archive and GenBank, respectively (BioProject# PRJNA615631). All newly generated COI sequences have been deposited at GenBank (MT267540-MT267668). Here we have deposited the concatenated UCE matrix, the COI matrix, all Trinity contigs, all tree files, unfiltered alignment files, and additional data analysis files (partitioning schemes, log files). The methods used to generate these data are described below and in the accompanying paper.
DNA sequence generation: We selected 130 specimens for inclusion in molecular phylogenomic analysis (Table S1): 128 Syscia and two outgroup specimens from the genus Ooceraea. All sequence data were newly generated for this study, except for 5 samples, for which data were extracted from Oxley et al. (2014; Genome), Branstetter et al. (2017), and Borowiec (2019) (see Table S1). Vouchers were designated for each extraction and may be the same specimen (non-destructive DNA extraction) or with varying degrees of subjectivity from the same nest, collection series, or rarely, population. Full voucher specimen details are in Supplementary Material, Table S2.
To examine species boundaries and phylogenetic relationships among species and populations, we employed the UCE approach to phylogenomics (Faircloth et al. 2012, Faircloth et al. 2015, Branstetter et al. 2017), a method that combines targeted enrichment of ultraconserved elements (UCEs) with multiplexed, next-generation sequencing. All UCE molecular work was performed following the UCE methodology described in Branstetter et al. (2017). Briefly, the process involves DNA extraction, sample QC, DNA fragmentation (400-600 bp), library preparation, library pooling (equimolar pools of 10 or 11 samples), UCE enrichment, qPCR quantification, final pooling (up to 102 samples per sequencing pool), and sequencing. All sequencing was performed on an Illumina HiSeq 2500 instrument (2x125 bp v4 chemistry; Illumina Inc., San Diego, CA) by the University of Utah genomics core facility. To enrich UCE loci, we used an ant-customized bait set (“ant-specific hym-v2”) that includes 9,898 baits (120 mer) targeting 2,524 UCE loci shared across Hymenoptera and a set of legacy markers (data not used) (Branstetter et al. 2017). The ability of this bait set to successfully enrich UCE loci and resolve relationships in ants has been demonstrated in several studies (Branstetter et al. 2017, Pierce et al. 2017, Ward and Branstetter 2017, Blaimer et al. 2018, Branstetter and Longino 2019, Longino and Branstetter 2020).
UCE matrix assembly: After sequencing, the University of Utah bioinformatics core demultiplexed the data using bcl2fastq v1.8 (Illumina, 2013) and made the data available for download. Once received, the sequence data were cleaned, assembled and aligned using PHYLUCE v1.6 (Faircloth 2016), which includes a set of wrapper scripts that facilitates batch processing of large numbers of samples. Within the PHYLUCE environment, we used the programs ILLUMIPROCESSOR v2.0 (Faircloth 2013), which incorporates TRIMMOMATIC (Bolger et al. 2014), for quality trimming raw reads, TRINITY v2013-02-25 (Grabherr et al. 2011) for de novo assembly of reads into contigs, and LASTZ v1.0 (Harris 2007) for identifying UCE contigs from all contigs. All optional PHYLUCE settings were left at default values for these steps. For the bait sequences file needed to identify and extract UCE contigs, we used the ant-specific hym-v2 bait file. To calculate assembly statistics, including sequencing coverage, we used scripts from the PHYLUCE package (phyluce_assembly_get_trinity_coverage and phyluce_assembly_get_trinity_coverage_for_uce_loci) that call the programs BWA v 0.7.7 (Li and Durban 2010) and GATK v3.8 (McKenna et al. 2010).
After extracting UCE contigs, we aligned each UCE locus using a stand-alone version of the program MAFFT v7.130b (Katoh and Standley 2013) and the L-INS-i algorithm. We then used a PHYLUCE wrapper to trim flanking regions and poorly aligned internal regions using the program GBLOCKS (Talavera and Castresana 2007). The program was run with reduced stringency parameters (b1:0.5, b2:0.5, b3:12, b4:7). We then used another PHYLUCE script to filter the initial set of alignments so that each alignment was required to include data for ≥ 90% of taxa. This resulted in a final set of 1,388 alignments and 1,035,633 bp of sequence data for analysis. To calculate summary statistics for the final data matrix, we used a script from the PHYLUCE package (phyluce_align_get_align_summary_data). Information related to UCE sequencing and assembly results can be found in Supplemental Material, Table S3. All steps, including the phylogenetic analyses described below, were performed on a multicore Linux workstation (40 CPUs and 512 Gb of memory).
Phylogenomic analysis: To partition the UCE data for phylogenetic analysis, we used the Sliding-Window Site Characteristics based on entropy method (SWSC-EN; Tagliacollo and Lanfear 2018), which breaks UCE loci into three regions, corresponding to the right flank, core, and left flank. The theoretical underpinning of the approach comes from the observation that UCE core regions are conserved, while the flanking regions become increasingly more variable (Faircloth et al. 2012). After running the SWSC-EN algorithm, the resulting data subsets were analyzed using PARTITIONFINDER2 (Lanfear et al. 2012, Lanfear et al. 2017). For this analysis we used the rclusterf algorithm, AICc model selection criterion, and the GTR+G model of sequence evolution. The resulting best-fit partitioning scheme included 1,126 data subsets and had a significantly better log likelihood than alternative partitioning schemes (SWSC-EN: -5,608,249.502; By Locus: -5,639,169.680; Unpartitioned: -5,731,679.666).
Using the SWSC-EN partitioning scheme, we inferred phylogenetic relationships of Syscia with the likelihood-based program IQ-TREE v1.5.5 (Nguyen et al. 2015). For the analysis we selected the “-spp” option for partitioning (linked branch lengths but allowing each partition to have its own evolutionary rate) and the GTR+F+G4 model of sequence evolution. To assess branch support, we performed 1,000 replicates of the ultrafast bootstrap approximation (UFB) (Minh et al. 2013, Hoang et al. 2018) and 1,000 replicates of the branch-based, SH-like approximate likelihood ratio test (Guindon et al. 2010). For these support measures, values ≥ 95% and ≥ 80%, respectively, signal that a clade is supported.
COI barcode analysis: Due to the high abundance of mitochondrial DNA in samples and the less-than-perfect efficiency of target enrichment methods, Cytochrome Oxidase I (COI) sequence data, and sometimes entire mitochondrial genomes (see Ströher et al. 2016) are often generated as a byproduct of the UCE sequencing process. To provide a separate assessment of species identities, possibly with more samples included, we extracted COI sequences from our UCE enriched samples and combined them with Syscia COI sequences downloaded from the BOLD database (Ratnasingham and Hebert 2007) (Accessed 16 May 2019). To extract COI from UCE data, we downloaded a complete 658 bp barcode sequence of a Costa Rican Syscia specimen from BOLD (Process ID ACGAE095-10, identified by us as S. benevidesae, one of the new species in this work) and used this as the bait input sequence for a PHYLUCE program (phyluce_assembly_match_contigs_to_barcodes) that extracts COI sequence from bulk sets of contigs.
After extracting COI sequence from UCE sample data, we downloaded accessible barcode sequences from BOLD following a series of steps. First, using the BOLD workbench interface, we searched for all records matching the taxonomy search term “Syscia” or “Cerapachys”. We then copied all of the resulting Barcode Index Numbers (BINs) and performed a second search using these numbers in the identifiers field. This approach recovers taxonomically mislabeled samples because BINs group sequences into units by sequence similarity, not name (Ratnasingham & Hebert 2013). All returned sequences were downloaded examined, and subsequently filtered to remove Old World specimens and entries with no sequence data. We also removed a misidentified sample from Madagascar and a sequence mined from GenBank that had no accompanying specimen data. Because some of the remaining sequences included private, unpublished data, we contacted data owners for permission to use the private sequences in our analyses.
We combined the final set of BOLD sequences with the successfully extracted COI sequences from UCE samples and aligned the data using MAFFT. We visually inspected the resulting alignment for signs of pseudogenes/numts (e.g. presence of stop codons, indels, or highly divergent sequence) or other anomalies using MESQUITE v3.51 (Maddison and Maddison 2018). The final matrix was partitioned by codon position and analyzed with IQ-TREE using GTR+F+G4, 1,000 ultrafast bootstrap replicates, and 1,000 SH-like replicates. Following a preliminary analysis of all samples, we discovered that a set of 79 putative “Cerapachys” samples actually belonged to the phylogenetically distinct genus Neocerapachys. Consequently, we removed these samples from our data set and updated determinations in BOLD. Sample information for the final set of 86 BOLD specimens included in our analysis is available in Supplemental Material, Table S4.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The supplement contains all materials (data, scripts, etc.) used in this publication.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The mean and standard error of the standard scores of assigning sequences to each protein family based on the emission matrix and similarity emission matrix .
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Hyptidinae, ca. 400 species, is an important component of Neotropical vegetation formations. Members of the subtribe possess flowers arranged in variously modified bracteolate cymes and nutlets with an expanded areole and all share a unique explosive mechanism of pollen release, except for Asterohyptis. In a recent phylogenetic study, the group had its generic delimitations rearranged with the recognition of 19 genera in the subtribe. Although the previous phylogenetic analysis covered almost all the higher taxa in the subtribe, it lacked a broader sampling at the species level. Here we present a new expanded phylogenetic analysis for the subtribe comprising 153 accessions of Hyptidinae sequenced for the nuclear nrITS, nrETS, and waxy regions and the plastid markers trnL-F, trnS-G, trnD-T, and matK. Our results widely support the previous phylogenetic results with some changes in the support and relationship between genera. It also uncovers the need for a new combination of Eriope machrisae in Hypenia and the phylogenetic position of Hyptis sect. Rhytidea, which was demonstrated to be part of Mesosphaerum. The generic delimitation in Hyptidinae is discussed, and we recommend that further studies with more markers are needed to confirm the monophyly of Hyptidendron and Mesosphaerum, as well as to support taxonomic changes on the infrageneric delimitation within Hyptis s. s.
Methods DNA Amplification and Sequencing—Total genomic DNA was extracted from fresh or silica-gel dried leaf material and sodium chloride/CTAB preserved material (Chase and Hills 1991) or from herbarium specimens. The fragments, when extracted from herbarium, were removed from HUEFS and K collections. A rescaled version of the Doyle and Doyle (1987) protocol was used for genomic DNA extractions. We chose the nuclear ribosomal internal transcribed spacer region (nrITS) including both ITS1 and ITS2 and intervening 5.8S and nuclear ribosomal external transcribed spacer region nrETS, and the nuclear low copy waxy granule-bound starch synthase I (GBSS). The plastid regions used were 3’trnK-matK (including partial trnK intron and matK coding region), trnL-F region (including the trnL intron and the trnL-trnF intergenic spacer), trnD-T and trnS-G region (including trnS-psbZ intergenic spacer, partial sequence; psbZ gene, complete cds; psbZ-trnG intergenic spacer, complete sequence; and tRNA-Gly (trnG) gene). The nrITS region was amplified using the primers 17SE and 26SE of Sun et al. (1994). The 3′ 18S-IGS primer of Baldwin and Markos (1998) and the 5′ primer ETS-B (Beardsley and Olmstead, 2002) were used to amplify a portion of the 3′ end of the nrETS. The nuclear region waxy were sequenced between GBSSI bd9f-bd11r, using the same methodology designed by Drew and Sytsma (2013). Therefore, for the GBSSI gene, we used a nested PCR approach to amplify the region between (and including parts of) exons 7–11. The initial PCR reaction was used the primers bd7f and bd12r Drew and Sytsma (2013). The PCR product from the above amplification was then used (after 1: 20 dilution) as a template for the additional PCR reaction, using the primers bd9f and bd11r. The product of this amplification was then sequenced with the same primers used in the nested PCR. The partial matK/trnK locus was amplified using 390F and 1326R (Cuénoud et al. 2002). The whole trnL-trnF region was amplified using primers “c” and “f”, with the use of internal primers “d” and “e” for some problematic samples, of Taberlet et al. (1991). For amplifying the trnS-G spacer we used the set of primers described in Shaw et al. (2007). The spacer trnD-T was amplified for most taxa using the primers of Demesure et al. (1995), trnD GUC and trnT GGU. For some samples which could not be amplified using these primers, we used the internal primer trnY GUA (Shaw et al. 2005). The plastid loci were sequenced using the same set of primers used for the amplification, whereas the nuclear nrITS was sequenced using internal primers ITS92 (Desfeux and Lejeune 1996) and ITS4 (White et al. 1990) with the same PCR program.
All PCR amplifications were performed in a final volume of 10 µL containing: 5 µL of TopTap master mix kit (Qiagen, Valencia, California), 2.25 pMol primers each, 5–10 ηg of genomic DNA, and ultrapure H2O (enough to complete the volume to 10 µL). For the ITS amplification, we added 2% DMSO (dimethyl sulfoxide) and 1M of betaine. All regions were amplified using initial denaturation at 94°C (5 min), 28 (ITS) or 32 (plastid loci) cycles of denaturation at 94°C (1 min), annealing 52°C (ITS) or 54°C (plastid loci) (1 min), elongation at 72°C (2 min), and a final elongation of 4 min. Amplified products were purified using precipitation with 11% solution of polyethylene glycol (PEG) 8000 and ethanol cleaning. Sequencing reactions in both directions were performed using BigDye Terminator 3.1 (Applied Biosystems, Carlsbad, California) chemistry and analyzed on an ABI3130XL sequencer (Applied Biosystems/Life Technologies Corporation, Carlsbad, California) following the manufacturer’s protocol at Universidade Estadual de Feira de Santana, Bahia, Brazil. Some PCR products were sequenced at the Interdisciplinary Center for Biotechnology Research at the University of Florida, Gainesville.
Sequence Assembly and Alignment—The sequences were edited using Geneious 6.1.8 (https://www.geneious.com) and aligned using the program Clustal2X (Larkin et al. 2007); alignments were checked by eye. Gaps were coded according to the "simple coding" criterion of Simmons and Ochoterena (2000) using the software Seqstate v.1.4.1 (Müller 2005).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Previous phylogenetic studies in oaks (Quercus, Fagaceae) have failed to resolve the backbone topology of the genus with strong support. Here, we utilize next-generation sequencing of restriction-site associated DNA (RAD-Seq) to resolve a framework phylogeny of a predominantly American clade of oaks whose crown age is estimated at 23–33 million years old. Using a recently developed analytical pipeline for RAD-Seq phylogenetics, we created a concatenated matrix of 1.40 E06 aligned nucleotides, constituting 27,727 sequence clusters. RAD-Seq data were readily combined across runs, with no difference in phylogenetic placement between technical replicates, which overlapped by only 43–64% in locus coverage. 17% (4,715) of the loci we analyzed could be mapped with high confidence to one or more expressed sequence tags in NCBI Genbank. A concatenated matrix of the loci that BLAST to at least one EST sequence provides approximately half as many variable or parsimony-informative characters as equal-sized datasets from the non-EST loci. The EST-associated matrix is more complete (fewer missing loci) and has slightly lower homoplasy than non-EST subsampled matrices of the same size, but there is no difference in phylogenetic support or relative attribution of base substitutions to internal versus terminal branches of the phylogeny. We introduce a partitioned RAD visualization method (implemented in the R package RADami; http://cran.r-project.org/web/packages/RADami) to investigate the possibility that suboptimal topologies supported by large numbers of loci—due, for example, to reticulate evolution or lineage sorting—are masked by the globally optimal tree. We find no evidence for strongly-supported alternative topologies in our study, suggesting that the phylogeny we recover is a robust estimate of large-scale phylogenetic patterns in the American oak clade. Our study is one of the first to demonstrate the utility of RAD-Seq data for inferring phylogeny in a 23–33 million year-old clade.
Database that hosts experimental data from universal protein binding microarray (PBM) experiments (Berger et al., 2006) and their accompanying statistical analyses from prokaryotic and eukaryotic organisms, malarial parasites, yeast, worms, mouse, and human. It provides a centralized resource for accessing comprehensive data on the preferences of proteins for all possible sequence variants ("words") of length k ("k-mers"), as well as position weight matrix (PWM) and graphical sequence logo representations of the k-mer data. The database's web tools include a text-based search, a function for assessing motif similarity between user-entered data and database PWMs, and a function for locating putative binding sites along user-entered nucleotide sequences.
Supplemental Figures: S1, S2, S3, S4, S5, S6 and S7
S1-S6: Phylograms using different percentiles of the ranked ortholog pairs of 46 species of Kinetoplastida protozoa (S1, S2 and S4), different cutoff for E-value (S3) and different methods for orthology inference (S5 and S6).
S7: Bar graph showing the total number of orthologous identified by the RSD and OrthoMCL algorithms for 78 pairs of species combinations used in the analysis, based on 13 species with sequences retrieved from TriTryp database, as indicated in Table 1 ("Proteins sequence source" column). Intersections (shared orthologs) and unique orthologs were calculated with gene ID lists as input using Venn diagram tool (http://bioinformatics.psb.ugent.be/webtools/Venn/).
File name: Supplemental_Figures_S1_S2_S3_S4_S5_S6_S7.pdf
Supplemental Table 1: Kinetoplastida pairwise matrices
Excel spreadsheet containing resulting tables of pairwise orthologs data (pairwise matrices). Sheet "AA distance": aminoacid distance obtain...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of 572 barcode sequences used in statistical analyses. The region targeted is noted, along with the number of reads recovered for that sequence in each sample (n, 50). Corresponding scientific paper: Boggs, L.M.; Scheible, M.K.; Machado, G.; Meiklejohn, K.A. Single Fragment or Bulk Soil DNA Metabarcoding: Which is Better for Characterizing Biological Taxa Found in Surface Soils for Sample Separation? Genes 2019, 10, 431