Aim: To infer relationships between populations of the semi-arid, mallee eucalypt, Eucalyptus behriana, to build hypotheses regarding evolution of major disjunctions in the species’ distribution and to expand understanding of the biogeographical history of south-eastern Australia. Location: South-eastern Australia Taxon: Eucalyptus behriana (Myrtaceae, Angiospermae) Methods: We developed a large dataset of anonymous genomic loci for 97 samples from 11 populations of E. behriana using double digest restriction site associated DNA sequencing (ddRAD-seq), to determine genetic relationships between the populations. These relationships, along with species distribution models, were used to construct hypotheses regarding environmental processes that have driven fragmentation of the species’ distribution. Results: Greatest genetic divergence was between populations on either side of the Lower Murray Basin. Populations west of the Basin showed greater genetic divergence between one another than the eastern populations. The most genetically distinct population in the east (Long Forest) was separated from others by the Great Dividing Range. A close relationship was found between the outlying northernmost population (near West Wyalong) and those in the Victorian Goldfields despite a large disjunction between them. Conclusions: Patterns of genetic variation are consistent with a history of vicariant differentiation of disjunct populations. We infer that an early disjunction to develop in the species distribution was that across the Lower Murray Basin, an important biogeographical barrier separating many dry sclerophyll plant taxa in south-eastern Australia. Additionally, our results suggest that the western populations fragmented earlier than the eastern ones, with this fragmentation, both west and east of the Murray Basin, likely tied to climatic changes associated with glacial-interglacial cycles, although major geological events including uplift of the Mount Lofty Ranges and basalt flows in the Newer Volcanics Province possibly also played a role. Supplementary tables.docx - Tables containing information on Ipyrad parameters and individual samples in the dataset used for analyses Concatenated_alignment.nex - nexus format concatenated alignment of all loci generated by ipyrad used in phylogenetic analyses One_SNP_per_5000bp_Egrandis_reference_genepop.txt - Genepop format file containing filtered SNP dataset containing no more than one SNP per 5000 bp of the E. grandis reference genome used to assemble loci in ipyrad.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains genetic sequences obtained from Hybrid-Enrichment and RAD sequencing protocols of the amphibian genera Discoglossus, Lissotriton, Rana and Triturus, as well as phylogenetic trees inferred from the RADseq data. This data was generated for the manuscript "Exploring the impact of read clustering thresholds on RADseq-based systematics: an empirical example from European amphibians.", in which we tested the influence of the clustering threshold used to assemble RADseq data on downstream phylogenetic inferences. Details on the data generation and analyses can be found in the manuscript and related supplementary materials.
The repository is organised as follow:
-> Hybrid-Enrichment: alignments of the Hybrid-Enrichment markers in phylip/fasta format (with one subdirectory for each of the four datasets assembled: Discoglossus, Lissotriton, Rana, Triturus)
--> RADseq: Assemblies and phylogenetic trees obtained from a RADseq protocol
--> Assemblies: RADseq assemblies (complete loci sequences and SNP matrices, spreadsheets with assembly metrics). Divided into "iCT" (assemblies produced with 23 different intra-sample Clustering Threshold [iCT] and a fixed between-samples Clustering Threshold [bCT]) and "bCT" (assemblies produced with a fixed iCT and 23 different bCT). Both iCT and bCT are further divided in four sub-directories corresponding to the four datasets: Discoglossus, Lissotriton, Rana, Triturus)
--> Trees: Phylogenetic trees inferred from the aforementionned assemblies. Divided into "iCT" (RAxML concatenation trees inferred from the assemblies with different iCTs) and "bCT" (RAxML concatenation trees and Tetrad species trees inferred from the assemblies with different bCTs).
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Genotyping-by-sequencing data (raw reads and assembled/aligned) for conservation genomic study on Corybas acotiniflorus species complex (Acianthinae, Diurideae, Orchidaceae). Reference: Natascha D. Wagner, Mark A. Clements, Lalita Simpson, Katharina Nargar: Conservation in the face of hybridisation: genome-wide study to evaluate taxonomic delimitation and conservation status of a threatened orchid species . Conservation Genetics (accepted manuscript).
Lineage: Material: The dataset includes 70 samples from the Corybas aconitiflorus complex: C. aconitiflorus (24 samples, 9 localities), C. barbarae (32 samples, 5 localities), C. dowlingii (14 samples, 2 localities), and Corybas pruinosus (2 samples). Sampling focused on the south-eastern distribution of the C. aconitiflorus complex. It extended from the restricted distribution of C. dowlingii (between Port Macquarie and Newcastle, New South Wales) ca. 300 km northwards to the border between New South Wales and Queensland (Uralba), ca. 1,200 km southwards to Tasmania (Ulverstone), and ca. 600 km eastwards to Lord Howe Island. DNA extraction, ddRAD library preparation, and sequencing: Total DNA was extracted from silica-dried leaf material using a modified CTAB protocol (Weising et al. 2005). Double-digest restriction-site associated DNA (ddRAD) sequencing libraries were prepared following Peterson et al. (2012) with the enzyme combination PstI and NlaIII. Quality and reproducibility of libraries and DNA sequencing were assessed by running five samples in duplicate (6.9 % of all samples). Multiplexed libraries were sequenced on one lane of a NextSeq500 sequencing platform (Illumina Inc., San Diego, CA, USA) as single-ended, 150 bp reads at the Australian Genome Research Facility (AGRF; Melbourne, Victoria, Australia). Bioinformatics and data filtering: Quality of the sequence reads was examined using FastQC v.0.11.5 (Andrews 2010). Raw sequences were demultiplexed, trimmed and further processed using the ipyrad pipeline v.0.6.15 (Eaton and Overcast 2016). In an initial filtering step, reads with more than five low quality bases (Phred quality score < 20) were excluded from the data set. The phred quality score offset was set to 33. The strict adapter trimming option was selected, and a minimum read length of 35bp after trimming was chosen to retain a read in the dataset. After these quality-filtering steps, the reads were clustered within and across samples by similarity of 85% using the vclust function in VSEARCH (Edgar 2010). The alignment was carried out using MUSCLE (Edgar 2004) as implemented in ipyrad. Clusters with less than six reads were excluded in order to ensure accurate base calls. The resulting clusters represent putative RAD loci shared across samples. A maximum number of five uncalled bases (‘Ns’) and a maximum number of eight heterozygote sites (‘Hs’) was allowed in the consensus sequences. The maximum number of single nucleotide polymorphisms (SNPs) within a locus was set to ten and the maximum number of indels per locus to five. For the sample set including all accessions of the C. aconitiflorus complex as well as two accessions of C. pruinosus as outgroup ipyrad runs for two different datasets were generated, i.e. based on loci shared by at least 20 individuals (m20) and on loci shared by at least 70 individuals (m70). Additionally, the same settings were used for ipyrad runs excluding the outgroup (C. pruinosus, 2 samples).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The reconstruction of relationships within recently radiated groups is challenging even when massive amounts of sequencing data are available. The use of restriction site-associated DNA sequencing (RAD-Seq) to this end is promising. Here, we assessed the performance of RAD-Seq to infer the species-level phylogeny of the rapidly radiating genus Cereus (Cactaceae). To examine how the amount of genomic data affects resolution in this group, we used distinct datasets and implemented different analyses. We sampled 52 individuals of Cereus, representing 18 of the 25 species currently recognized, plus members of the closely allied genera Cipocereus and Praecereus, and other 11 Cactaceae genera as outgroups. Three scenarios of permissiveness to missing data were carried out in iPyRAD, assembling datasets with 4330% (333 loci), 45% (1440 loci), and 70% (6141 loci) of missing data. For each dataset, Maximum Likelihood (ML) trees were generated using two supermatrices, i.e., only SNPs and SNPs plus invariant sites. Accuracy and resolution were improved when the dataset with the highest number of loci was used (6141 loci), despite the high percentage of missing data included (70%). Coalescent trees estimated using SVDQuartets and ASTRAL are similar to those obtained by the ML reconstructions. Overall, we reconstruct a well-supported phylogeny of Cereus, which is resolved as monophyletic and composed of four main clades with high support in their internal relationships. Our findings also provide insights into the impact of missing data for phylogeny reconstruction using RAD loci. SamplingOur dataset includes 63 samples spanning 52 ingroups of Cereus and 11 outgroups (Table 1). ddRAD library preparation and sequencing 157Genomic DNA was extracted from root tissues using the DNeasy Plant Mini Kit (Qiagen). ddRAD libraries were prepared using high fidelity EcoRI and HPAII restriction enzymes following Campos et al. (2017) and Khan et al. (2019). Details of library preparation and sequencing are shown in Supplementary materialBioinformatics analyses Raw data were trimmed for adapters and quality filtered before SNPs calling. The quality of sequencing data was checked with FastQC 0.11.2 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc), visualized in MultiQC 1.0 (https://github.com/ewels/MultiQC), and filtered with SeqyClean 1.9.12 (Zhbannikov et al., 2017) using the following settings: minimum quality (Phred Score 20), minimum size (>65 bp), and Illumina contaminants (UniVec.fas). We used the iPyRAD pipeline (available at http://github.com/dereneaton/ipyrad) to identify homology among reads, make SNP calls, and format output files. The following parameter settings were implemented: mindepth_majrule = 6 (minimum depth for majority-rule base calling), clust_threshold = 0.85 (clustering threshold for de novo assembly), filter_adapters = 2 (strict filter), max_Hs_consens = 6 (maximum heterozygotes in consensus), min_samples_locus (minimum percentage of samples per locus 184for output). For the latter, values varied in three distinct scenarios concerning the permissiveness to missing data. These scenarios considered that the final set of loci should have at least 39 samples (scenario 1, approximately 30% of missing data), 26 samples (scenario 2, approximately 45% of missing data), or 13 samples (scenario 3, approximately 70% of missing data). After SNP calling, CD-HIT (Li and Godzik, 2006; Fu et al., 2012) was used to identify reverse-complement duplicates in the loci recovered by iPyRAD.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Number of putative loci identified across all samples for each iPYRAD pipeline and number retained after each filtering step.
Restriction site-Associated DNA sequencing (RADseq) has great potential for genome-wide systematics studies of non-model organisms. However, accurately assembling RADseq reads into orthologous loci remains a major challenge in the absence of a reference genome. Traditional assembly pipelines cluster putative orthologous sequences based on a user-defined clustering threshold. Because improper clustering of orthologs is expected to affect results in downstream analyses, it is crucial to design pipelines for empirically optimizing the clustering threshold. While this issue has been largely discussed from a population genomics perspective, it remains understudied in the context of phylogenomics and coalescent species delimitation. To address this issue, we generated RADseq assemblies of representatives of the amphibian genera Discoglossus, Rana, Lissotriton and Triturus using a wide range of clustering thresholds. Particularly, we studied the effects of the intra-sample Clustering Threshold...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ipyrad dataset contains alignment of 19 413 ddRAD loci of 35 Empria specimens. The script takes as an input the ipyrad dataset and a table is produced for every locus (rows) and specimen (columns) where cells contain list of specimens that are identical to a specimen indicated in the column (the cell is empty if there are no identical specimens for a particular specimen and locus). Additional columns are added to get information per locus about identical specimens between two groups, the number of specimens, maximum, median and mean divergence. The two groups examined are longicornis (including E. tridentis, which taxonomically is not a member of the group, but closely related) and immersa groups. For both groups and for every locus, specimens are recorded that are identical to any member in the other group while different from specimens in its own group. The second table produced by the script lists the specimens in the dataset, the number of loci, and the normalised number of loci per specimen. Normalised numbers of loci were calculated as half of the maximum number of loci divided by the number of loci of a particular specimen in the dataset. Then the script proceeds to produce bar plots (output as pdf) for every specimen showing percent of loci and normalised percent of loci that are identical to a particular specimen while different from all others. Two additional bar plots are produced for longicornis and immersa groups to show percent of loci of a particular specimen that are identical to any specimen in the wrong group while different from specimens in its own group.
The use of species as a concept is an important metric for assessing biological diversity and ecosystem function. However, delimiting species based on morphological characters can be difficult, especially in aquatic plants that exhibit high levels of variation and overlap. The Sphagnum cuspidatum complex, which includes plants that dominate peatland hollows close to or at the water table, provides an example of challenges in species delimitation. Microscopic characters that have been used to define taxa and the possibility that these characters may simply be phenoplastic responses to variation in water availability make species delimitation in this group especially difficult. In particular, the use of leaf shape and serration, which have been used to separate species in the complex, have resulted in divergent taxonomic treatments. Using a combination of high-resolution population genomic data (RADseq) and a robust morphological assessment of plants representing the focal species, we pro..., , File cuspidatum_demultiplexed_illumina_reads_20230315.tar.gz: Zipped folder with 135 files of demultiplexed Illumina reads for Sphagnum samples included in the analyses. File cuspidatum_dataset_all_135_samples.phy: Phylip format alignment of 8367 RADseq loci generated by ipyrad for 135 Sphagnum samples. File cuspidatum_dataset_all_135_samples.loci: Loci format file of 8367 RADseq loci generated by ipyrad for 135 Sphagnum samples. File cuspidatum_plastid_loci_52_samples: Fasta format alignment of 5 plastid concatenated loci generated by ipyrad and identified them by mapping to Sphagnum fallax reference genome (52 samples). File cuspidatum_dataset_ingroup_57_samples.ustr: Structure format file (two lines per samples, considering S. torreyanum and S. mississippiense samples as diploid and the remain species as haploid) with one randomly selected SNP per locus generated by ipyrad for 57 Sphagnum samples.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All output files generated by running the raw GBS data through the ipyad pipeline.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"alignments" folder contains concatenated ddRAD phylip alignments produced by iPyrad. mXXp
naming scheme consistent with iPyrad parameter files.
"bridledCOB.nex" contains the DNA sequence alignment for the mtDNA gene cytochrome b used in the Bayesian phylogenetic analysis.
"bridledCOB.nex.con.tre" summarized posterior tree from Bayesian analysis of the mtDNA cytochrome b data.
"bridledNUC.nex" contains the DNA sequence alignment for the 11 nuclear genes used in the Bayesian phylogenetic analysis.
"bridledNUC.nex.con.tre" summarized posterior tree from Bayesian analysis of the 11 nuclear genes.
"fastqs" folder contains demultiplexed fastq files containing ddRAD reads for each individual.
"iPyrad_paramFiles" folder contains assembly parameters for iPyrad. mXXp
indicates the minimum proportion of samples per locus (min_samples_locus
iPyrad parameter). For example, m70p
indicates that each locus is represented by at least 70% of the samples.
"P_freemanorum_meristic_data.csv" the merstic data of Percina freemanorum.
"P_freemanorum_meristic_specimen_info.csv" information associated with specimens of Percina freemanorum.
"P_kusha_meristic_data.csv" the merstic data of Percina kusha.
"P_kusha_meristic_specimen_info.csv" information associated with specimens of Percina kusha.
"IQTree" folder contains bash script to run IQTree analyses and resulting treefiles. mXXp
naming scheme consistent with iPyrad parameter files.
"VCFs" folder contains bash script to run VCFTools filtering, outgroup file listing outgroup taxa to prune, input VCF file from iPyrad, and filtered VCF files. Unlinked
indicates that SNPs have been pruned to include only one SNP per ddRAD locus. mXXp
naming scheme consistent with iPyrad parameter files.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Biological control agents have several advantages over chemical control for pest management, including the capability to restore ecosystem balance with minimal non-target effects and a lower propensity for targets to develop resistance. These factors are particularly important in the invasive species control. The coconut rhinoceros beetle (Oryctes rhinoceros Linnaeus) is a major palm pest that invaded many Pacific islands in the early 20th century through human-mediated dispersal. Application of the Oryctes nudivirus in the 1960's successfully halted the beetle's first invasion wave and made it a textbook example of successful biological control. However, a recently discovered O. rhinoceros biotype that is resistant to the nudivirus appears to be correlated with a new invasion wave. We performed a population genomics analysis of 172 O. rhinoceros from seven regions, including native and invasive populations, to reconstruct invasion pathways and explore correlation between recent invasions and biotypes. From ddRAD sequencing, we generated datasets ranging from 4,000 to 209,000 loci using STACKS and IPYRAD software pipelines and compared genetic signal in downstream clustering and phylogenetic analyses. Analysis suggests that the O. rhinoceros resurgence is mediated by the nudivirus-resistant biotype. Genomic data has proven essential to understanding the new O. rhinoceros biotype's, invasion patterns and interactions with the original biotype. Such information is crucial to optimization of strategies for quarantine and control of resurgent pests. Our results demonstrate that while invasions are relatively rare events, new introductions can have significant ecological consequences, and quarantine vigilance is required even in previously invaded areas.
The supplementary material consists of two sets of files:
'caps_0_95_min141_max005_outfiles' represent the raw output files of the ipyrad analysis of a taxon sample containing all 282 accessions of the five investigated Capsella species.
0_95 refers to the ## [14] [clust_threshold]: Clustering threshold for de novo assembly in the ipyrad parameter file min141 refers to the ## [21] [min_samples_locus]: Min # samples per locus for output in the ipyrad parameter file max005 refers to the ## [22] [max_SNPs_locus]: Max # SNPs per locus in the ipyrad parameter file
'ori_0_95_min117_max005_outfiles' represent the raw output files of the ipyrad analysis of a taxon sample containing 235 investigated Capsella orientalis accessions.
0_95 refers to the ## [14] [clust_threshold]: Clustering threshold for de novo assembly in the ipyrad parameter file min117 refers to the ## [21] [min_samples_locus]: Min # samples per locus for output in the ipyrad parameter file max005 refers to the ## ...
We sampled tissue of 23 Phytotoma rutila (Table S1.1) collected through the species’ whole breeding and altitudinal range (Fig. 2a). We purified genomic DNA from muscle or blood with the DNeasy Blood & Tissue purification kit (Qiagen). We sampled genomic markers by double-digest restriction site-associated DNA sequencing (ddRADseq, (Peterson, Weber, Kay, Fisher, & Hoekstra, 2012) following the approach outlined by Peterson et al. (Peterson et al., 2012) and described in (Thrasher, Butcher, Campagna, Webster, & Lovette, 2018). Briefly, we digested samples with SbfI and MspI (New England Biolabs, MA) and ligated the digested DNA to adapters on both the 5’ and 3’ ends that allow multiplexing. The 5’ adaptors included barcodes and were unique to each sample, while the 3’ barcode was common to groups of 20 samples. We pooled samples with unique 3’ barcodes and selected DNA fragments in the 400-700 bp range. The libraries were enriched, and the TruSeq adapters were incorporated by performing 13 cycles of PCR. Finally, libraries were combined in equimolar proportions and sequenced on an Illumina HiSeq 2500 lane at the Cornell University Institute for Biotechnology, obtaining single-end 101 bp sequences. We demultiplexed, trimmed, filtered reads, assembled loci, and called single nucleotide polymorphism (SNPs) with ipyrad 0.7.28 (Eaton & Overcast, 2020). Parameters of the ipyrad pipeline and further SNP filtering using VCFtools (Danecek et al., 2011) are in Table S1.2. We used the first dataset of 23 samples for exploratory analyses and the second dataset of 21 samples (see below) in the downstream analyses. The ipyrad outfiles of the 23 samples dataset were used for the exploratory analyses such as STRUCTURE, maximum likelihood reconstruction, and PCA. Then, we removed two samples with excessive missing data (> 78%, MACN-Or-Ct 4079 and MACN-Or-Ct 2766) and re-run ipyrad over the remaining 21 individuals. We proceeded with SNP filtering using VCFtools (Danecek et al., 2011),(Danecek et al., 2011) to further filtering of SNPs. Our criteria to obtain the final RADseq data-set are: (a) remove sites with a quality score less than 30; (b) exclude sites with more than 80% of missing data (N); (c) remove low-frequency SNPs setting a Minimum Allele frequency of 0.05; (d) keep only biallelic SNPs; (e) reduce the Maximum Depth coverage to 300x (mean depth coverage of SNP in our sample + 1 standard deviation), and Minimum Depth coverage of 6x. Then, we randomly selected one SNP per locus, obtaining a final data set of 4,893 unlinked SNP (n = 21 samples, maximum missing rate per individual of 31.6%, an average depth across all loci of 89.1). We used this final ddRADseq dataset for all the remaining genetic analyses, except for those previously stated. Also, we used .loci ipyrad outfile to carry out G-PhocS and DXY-FST analysis. This filter set was used to remove SNPs that might be the product of sequencing errors, as these can create biases in inferring demographic processes (Linck & Battey, 2019; O’Leary, Puritz, Willis, Hollenbeck, & Portnoy, 2018; Willis, Hollenbeck, Puritz, Gold, & Portnoy, 2017). Aim: Along with environmental gradients, some species show significant differences in morphological, ecological-related traits. Those differences are commonly related to past events of allopatry but, alternatively, could be caused by natural selection in the presence of gene flow. We aimed to explore the prevalence of the divergence-with-gene-flow model across the Chaco-Andes dry forest belt, testing competing models of evolution in a Neotropical bird. Location: Central Andes Mountain range and Chaco region of Argentina and Bolivia. Taxon: Phytotoma rutila (Aves, Cotingidae). Methods: We studied ddRADseq loci (4,893 SNPs) of 21 tissue samples and body size variation of 146 specimens. We evaluated population genetic structure and tested the effects of altitude and distance on genomic divergence. To evaluate allopatry and divergence-with-gene-flow, we compared the divergence on phenotypic traits (bill, tarsus, and wing measurements) versus neutral genomic variation, conducted coalescent analyses to estimate gene flow and divergence time among populations, and calculated relative (FST) versus absolute (DXY) genomic divergence. Results: a) there is a genomic and phenotypic differentiation in P. rutila matched the highland-lowland axis, where the altitude variation explains genomic variation; b) A larger phenotypic than neutral genomic variation was found. c) there is an asymmetric gene flow between populations; d) a pattern of relative and absolute genomic differentiation compatible with divergence-with-gene-flow. Main conclusions: The mechanism behind the morphological and genomic diversification along the Chaco-Andes dry forest belt in P. rutila is divergence‐with‐gene‐flow. Far more complex than we traditionally thought, diversification in South America i...
Zipped folder with 88 files of demultiplexed Illumina reads for Sphagnum samples included in the analyses. Phylip format alignment of 6692 RADseq loci generated by ipyrad for 88 Sphagnum samples. Loci format file of 6692 RADseq loci generated by ipyrad for 88 Sphagnum samples. Fasta format alignment of 8 plastid concatenated loci generated by ipyrad and identified them by mapping to Sphagnum angustifolium reference genome (76 samples). Structure format file (two lines per samples, considering S. majus samples as diploid and the remain species as haploid) with one randomly selected SNP per locus generated by ipyrad for 78 ingroup S. majus and putative parental species. Structure format file (two lines per samples, considering S. majus samples as diploid) with one randomly selected SNP per locus generated by ipyrad for 63 ingroup S. majus.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary data for "Testing the efficacy of different molecular tools for parasite conservation genetics: a case study using horsehair worms (Phylum Nematomorpha)"
alignments: alignments used for BEAST ("bayes") and PopArt ("popart"). The "popart" folder also has a traits file per each species.
bayesian_plots: TSVs ("tsv") and PDF files ("ogs") generated by BEAST. The "tsv" folder also has the scripts for plotting the results in R.
easysfs: scripts, population file and results from the VCF to SFS conversione done by easySFS.
fineRADstructure: fineRADstructure input files and output PDF plots ("plots") for C. formosanus ipyrad and Stacks ("stacks") data.
logs: logs for ipyrad, ModelTest, PGDspider, PopArt ("popart") and Stacks ("stacks"). The "popart" folder also have the generated networks in a TXT file. The "stacks" folder also has ODS files for calculating the amount of loci per each M/n fixed value.
snapclust: STR files used with R for snapclust. Scripts included.
stairway_plot: input (blueprint files) and outputs for Stairway Plot 2 analyses. The C. formosanus folder ("chordodes") also has scripts for R plotting.
vcfs: VCF and HDF5 files used in this study. Also scripts for filtering/converting data and plotting the PCA with ipyrad (activate python first!) for C. formosanus.
"acutogordius" = A. taiwanensis "chordodes" = C. formosanus "gordius" = G. chiashanus
A fundamental objective of evolutionary biology is to understand the origin of independently evolving species. Phylogenetic studies of species radiations rarely are able to document ongoing speciation; instead, modes of speciation, entailing geographic separation and/or ecological differentiation, are posited retrospectively. The Oreinotinus clade of Viburnum has radiated recently from north to south through the cloud forests of Mexico and Central America to the Central Andes. Our analyses support a hypothesis of incipient speciation in Oreinotinus at the southern edge of its geographic range, from central Peru to northern Argentina. Although several species and infraspecific taxa of have been recognized in this area, multiple lines of evidence and analytical approaches (including analyses of phylogenetic relationships, genetic structure, leaf morphology, and climatic envelopes) favor the recognition of just a single species, V. seemenii. We show that what has previously been recognized..., We collected leaf tissues from herbarium samples or from our own recent collections of these plants from central Peru to southern Bolivia. We included five samples previously classified as V. incarum, 29 as V. seemenii, and 13 as “New Name 2†. Whenever possible multiple individuals per population were included, and all collections were deposited in the Yale Herbarium (YU) except for three specimens located in NY and MO (Appendix 1). Total DNA was extracted from leaf tissues using DNeasy plant extraction kits (Qiagen Inc., Hilden, Germany). RAD-seq data were generated by Floragenex Inc. (Eugene, Oregon) by digesting genomic DNA with the PstI restriction enzyme, followed by sonication and size selection at 400 bp. Samples were ligated with 10bp barcodes for multiplexing. Then samples were pooled and sequenced on Illumina HiSeq 2500 or 4000 to produce 100bp SE reads. Samples were demultiplexed and assembled into orthologous loci with ipyrad v.0.9.85 (Eaton and Overcast 2020) using a refere..., Ipyrad is needed to open HDF5 files, # Data from: Caught in the act: Incipient speciation at the southern limit of Viburnum in the Central Andes
In this repository, we are storing the following:
HDF5 file of the assembly produced by ipyrad of our RADseq reads (bolivia_history.seqs.hdf5) HDF5 files with the SNPs found in item 1 (bolivia_history.snps.hdf5) Phylip file containing the final alignment used for the tree reconstruction (10-bolivia-initial_mcov0.25_rcov0.1_ALLscaff_SelectiveSampling.phy)
HDF5 files were produced by ipyrad software (https://ipyrad.readthedocs.io/en/master/) which is a organized database that contains not only the DNA sequence, but also additional information used by this software for further analysis. As cross compatible file, we provide a Phylip version of the final alignment.
For additional reproducibility material (scripts, notebooks) check: https://github.com/camayal/southern-oreinotinus
Advances in genomics have greatly enhanced our understanding of mountain biodiversity, providing new insights into the complex and dynamic mechanisms that drive the formation of mountain biotas. These span from broad biogeographic patterns to population dynamics and adaptations to these environments. However, significant challenges remain in integrating large-scale and fine-scale findings to develop a comprehensive understanding of mountain biodiversity. One significant challenge is the lack of genomic data, particularly in historically understudied arid regions where reptiles are a particularly diverse vertebrate group. In the present study, we assembled a de novo genome-wide SNP dataset for the complete endemic reptile fauna of a mountain range (19 described species with more than 600 specimens sequenced), and integrated state-of-the-art biogeographic analyses at the population, species and community level. Thus, we provide for the first time a holistic integration of how a whole ende..., Raw ddRADseq demultiplexed reads are available and can be processed with ipyrad (https://ipyrad.readthedocs.io/). Supplementary figures, supplementary tables datasets and tree files are uploaded as separate files. , , # Title of Dataset: Integrating genomics and biogeography to unravel the origin of a mountain biota: The case of a reptile endemicity hotspot in Arabia
Brief summary of dataset contents.
Data archive for:
Integrating genomics and biogeography to unravel the origin of a mountain biota: The case of a reptile endemicity hotspot in Arabia
by Bernat Burriel-Carranza, Hctor Tejero-Cicundez, Albert Carn, Gabriel Riao, Adrin Talavera, Saleh Al Saadi, Johannes Els, Ji md, Karin Tamar, Pedro Tarroso and Salvador Carranza
This dataset contains ddRADseq raw reads, data files that should allow replication of the workflow, and the resulting phylogenomic and pylogenetic trees produced in this work. Also, this dataset contains supplementary material (Figures and Tables), and the extended methods related to this publication.
In the present work, we assembled a large genomic database (n = 661) for all endemic reptiles of the Hajar Mountains. We investigated the diversity, population stru...
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background Phylogeographical approaches explain the genetic diversity of local organisms in the context of their geological and geographic environments. Thus, genetic diversity can be a proxy for geological history. Here we propose a genus of woodland isopod, Ligidium, as a marker of geological history in relation to orogeny and the Quaternary glacial cycle. Results Mitochondrial analysis of 721 individuals from 97 sites across Japan revealed phylogenetic divergence between the northeastern and southwestern Japan arcs from 7 to 3.5 million years ago. It also showed repeated population expansions in northeastern Japan in response to Quaternary glacial and interglacial cycles. Genome-wide analysis of 83 selected individuals revealed multiple genetic nuclear clusters. The genomic groupings were consistent with the local geographic distribution, indicating that the Ligidium phylogeny reflects its migration history. Conclusion Ligidium DNA sequence analysis can provide insight into the geological, geographical, and paleoenvironmental history of the studied region. Methods Sample collection We surveyed Ligidium populations in Japan. We collected 828 Ligidium specimens from 97 sites and sequenced 721 samples (Fig. 1, Table 1). Samples were preserved in 70–99.5% ethanol in 2-mL microtubes at 4°C or room temperature. DNA extraction, amplification, and sequencing Genomic DNA was isolated from the muscles of the abdomen and legs with a DNA Mini Kit (Qiagen, Germantown, MD, USA). PCR amplification was conducted using the primer pair LCO-1490 and HCO-2198. Amplification and cycling conditions were as in a previous work. Details of the experimental conditions are provided in the Supplemental Data. Sequencing was conducted with an ABI 3130 Genetic Analyzer (Applied Biosystems, Waltham, MA, USA). Sequences were checked and assembled using MEGA7. Mitochondrial gene locus (CO1) sequences for 721 individuals were determined and aligned using ClustalW. RAD-seq We performed RAD-seq analysis to search for SNPs in individuals from Niigata and Hokkaido obtained in previous studies, two regions in northern Japan (Aomori and Sendai), and two regions in western Japan (Shizuoka and Sendai). Tables 1 and S1 list the samples used, and the sampling sites are shown in Fig. 1c. Genomic DNA was isolated from almost whole-body tissues with a DNA Mini Kit (Qiagen). Libraries for RAD-seq were prepared with EcoRI and BglII restriction enzymes. The library was sequenced with 150 + 150 bp paired-end reads in one lane of an Illumina HiSeqX instrument (Illumina, San Diego, CA, USA) by Macrogen (Seoul, South Korea). Raw reads were trimmed using Trimmomatic-0.39 with the following parameters: ILLUMINACLIP: adapter.fasta:2:30:10:keepBothReads, SLIDINGWINDOW: 4:15, CROP: 132, HEADCROP: 2, and MINLEN: 130. Sequences are available at the DNA Data Bank of Japan (DDBJ) Sequence Read Archive (DRA014204). We used two pipeline programs to call the SNPs: denovo_map.pl provided by Stacks and ipyrad. Following a previous work, we varied the combinations of the denovo_map.pl parameters as follows: (n, M) = (2, 1), (3, 2), (4, 3), (5, 6), and selected (n, M) = (2, 1), which called the most SNPs. We used Stacks’ populations program to analyze populations of individual samples, calculate population genetics statistics, and export data in various output formats for analysis. PLINK v1.90b6.18 was used for data handling. Alleles with a frequency of < 1% and sites with > 50% heterozygosity were removed. Only SNPs shared by ~80% of the individuals were retained. With ipyrad, loci with frequencies of > 50% heterozygosity were removed, and SNPs shared by ~70% of the local populations were retained. We retained SNPs shared by at least two individuals and filtered out individuals that did not have 80% of all SNPs using TASSEL 5. After filtering with TASSEL 5, we used PGDSpider to convert the VCF files for other analyses. Genetic structure analysis We tested the ability of Structure v. 2.3.4 to determine the genetic structure of populations using Bayesian cluster analysis. Ten simulations were run, with the burn-in period and Markov chain Monte Carlo iterations set to 105 and 106, respectively. The maximum value of K was determined based on the mtDNA results and geographical distribution. For the Structure analysis, one SNP was randomly sampled from each locus to avoid the effect of linkage disequilibrium. The python script vcf_single_snp.py (radseq/vcf_single_snp.py at master · pimbongaerts/radseq · GitHub) was used to obtain the one SNP datum from ipyrad, and drawings were created using the R package pophelper. In addition, PCA was performed to visualize the genetic differences among populations using the adegenet package in R. We obtained pairwise Fst values for the RAD-seq dataset using Arlequin 3.5.1.2. Fst values were used to test population structure, supported by cluster analysis, and statistical significance was based on 1000 restored extractions.
Sampling individuals in the wild. Tissue samples for extracting DNA. Library preparation and sequencing were performed at the University of Wisconsin Biotechnology Center (UWBC) following a Genotyping by sequencing approach. Libraries were prepared using the ApeKI enzyme (cutting site: C[AT]G) and sequenced on a single lane of an Illumina Hi-Seq 2500 with single-end 101 bp reads. A de novo assembly was performed using raw data processed with ipyrad (Eaton and Overcast 2020; https://ipyrad.readthedocs.io/en/master/). We further processed our data using VCFtools v.0.1.13 (see publication for the bioinformatics description).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw reads were demultiplexed, filtered and assembled using iPYRAD v0.9.59 (Eaton and Overcast, 2020). Demultiplexing was done using the unique barcode and adapter sequences. Then, samples’ reads were filtered using the stricter filter for Illumina adapters, and filtered to clean up the edges of poor quality reads. A Reference based assembly method was implemented using the reference draft genome CactoFuEDEI.fa. We set 2 as the maximum number of unique alleles allowed in consensus reads, and ran ipyrad to obtain four datasets, each one controlling 25%, 50%, 80% and 90% of the minimum number of samples per locus (msl), respectively.
Aim: To infer relationships between populations of the semi-arid, mallee eucalypt, Eucalyptus behriana, to build hypotheses regarding evolution of major disjunctions in the species’ distribution and to expand understanding of the biogeographical history of south-eastern Australia. Location: South-eastern Australia Taxon: Eucalyptus behriana (Myrtaceae, Angiospermae) Methods: We developed a large dataset of anonymous genomic loci for 97 samples from 11 populations of E. behriana using double digest restriction site associated DNA sequencing (ddRAD-seq), to determine genetic relationships between the populations. These relationships, along with species distribution models, were used to construct hypotheses regarding environmental processes that have driven fragmentation of the species’ distribution. Results: Greatest genetic divergence was between populations on either side of the Lower Murray Basin. Populations west of the Basin showed greater genetic divergence between one another than the eastern populations. The most genetically distinct population in the east (Long Forest) was separated from others by the Great Dividing Range. A close relationship was found between the outlying northernmost population (near West Wyalong) and those in the Victorian Goldfields despite a large disjunction between them. Conclusions: Patterns of genetic variation are consistent with a history of vicariant differentiation of disjunct populations. We infer that an early disjunction to develop in the species distribution was that across the Lower Murray Basin, an important biogeographical barrier separating many dry sclerophyll plant taxa in south-eastern Australia. Additionally, our results suggest that the western populations fragmented earlier than the eastern ones, with this fragmentation, both west and east of the Murray Basin, likely tied to climatic changes associated with glacial-interglacial cycles, although major geological events including uplift of the Mount Lofty Ranges and basalt flows in the Newer Volcanics Province possibly also played a role. Supplementary tables.docx - Tables containing information on Ipyrad parameters and individual samples in the dataset used for analyses Concatenated_alignment.nex - nexus format concatenated alignment of all loci generated by ipyrad used in phylogenetic analyses One_SNP_per_5000bp_Egrandis_reference_genepop.txt - Genepop format file containing filtered SNP dataset containing no more than one SNP per 5000 bp of the E. grandis reference genome used to assemble loci in ipyrad.