Facebook
TwitterPatterson’s D, also known as the ABBA-BABA statistic, and related statistics such as the f4-ratio, are commonly used to assess evidence of gene flow between populations or closely related species. Currently available implementations often require custom file formats, implement only small subsets of the available statistics, and are impractical to evaluate all gene flow hypotheses across datasets with many populations or species due to computational inefficiencies. Here we present a new software package Dsuite, an efficient implementation allowing genome scale calculations of the D and f4-ratio statistics across all combinations of tens or hundreds of populations or species directly from a variant call format (VCF) file. Our program also implements statistics suited for application to genomic windows, providing evidence of whether introgression is confined to specific loci and it can also aid in interpretation of a system of f4-ratio results with the use of the ‘f-branch’ method. Dsuite is available at https://github.com/millanek/Dsuite, is straightforward to use, substantially more computationally efficient than comparable programs, and provides a convenient suite of tools and statistics, including some not previously available in any software package. Thus, Dsuite facilitates the assessment of evidence for gene flow, especially across larger genomic datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Detailed results from each dataset. Description: Input parameters, sensitivity, significance information and linear regression of the f-statistics, from all datasets in three simulation schemes. (XLSX 214 kb)
Facebook
TwitterREADMEGeneral summary of scripts and commands used in this study.Figure_1.RScript to generate plots in Figure 1 B and C.compare_f_estimators.rScript to generate Figures 2 and S1.Figures_3_S3.RScript to generate Figures 3 and S3. Requires data files such as Heliconius_autosome_windows_5kb.csv and Heliconius_Zchromosome_windows_5kb.csv. These names are hard-coded into this script, so editing is required to load different files.Figure_4.RScript to generate Figure 4. Requires as input files such as model_files_win10000_s0.01_l5000_r50.alternate_models.dxy.summary.sg.tsv, generated using run_model_combinations.py, shared_ancestry_simulator.R and generate_summary_statistics.R.Figure_5.RScript to generate Figure 5. Requirees as input files such as model_files_win10000_s0.01_l5000_r50.alternate_models.dxy.summary.sg.tsv, generated using run_model_combinations.py shared_ancestry_simulator.R and generate_summary_statistics.R.egglib_sliding_windows.pyPython script to calculate ABBA BABA statistics...
Facebook
TwitterThe evolutionary implications and frequency of hybridization and introgression are increasingly being recognized across the tree of life. To detect hybridization from multi-locus and genome-wide sequence data, a popular class of methods is based on summary statistics from subsets of 3 or 4 taxa. However, these methods often carry the assumption of a constant substitution rate across lineages and genes, which is commonly violated in many groups. In this work, we quantify the effects of rate variation on the D test (also known as ABBA-BABA test), the D3 test, and HyDe. All three tests are used widely across a range of taxonomic groups, in part because they are very fast to compute. We consider rate variation across species lineages, across genes, their lineage-by-gene interaction, and residual variation across gene-tree edges. We do so by simulating gene trees within species networks according to a birth-death-hybridization process so as to capture a range of realistic species phylogenies...
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The role of interspecific hybridization has recently seen increasing attention, especially in the context of diversification dynamics. Genomic research has now made it abundantly clear that both hybridization and introgression – the exchange of genetic material through hybridization and backcrossing – are far more common than previously thought. Besides cases of ongoing or recent genetic exchange between taxa, an increasing number of studies report “ancient introgression” – referring to results of hybridization that took place in the distant past. However, it is not clear whether commonly used methods for the detection of introgression are applicable to such old systems, given that most of these methods were originally developed for analyses at the level of populations and recently diverged species, affected by recent or ongoing genetic exchange. In particular, the assumption of constant evolutionary rates, which is implicit in many commonly used approaches, is more likely to be violated as evolutionary divergence increases. To test the limitations of introgression detection methods when being applied to old systems, we simulated thousands of genomic datasets under a wide range of settings, with varying degrees of among-species rate variation and introgression. Using these simulated datasets, we showed that some commonly applied statistical methods, including the D-statistic and certain tests based on sets of local phylogenetic trees, can produce false-positive signals of introgression between divergent taxa that have different rates of evolution. These misleading signals are caused by the presence of homoplasies occurring at different rates in different lineages. To distinguish between the patterns caused by rate variation and genuine introgression, we developed a new test that is based on the expected clustering of introgressed sites along the genome, and implemented this test in the program Dsuite. Methods Genomic datasets have been simulated with msprime, and processed with Dsuite, IQ-TREE, SNaQ, and QuIBL.
Facebook
TwitterOver the past 15 years, the D-statistic, a four-taxon test for organismal admixture (hybridization, or introgression) which incorporates single nucleotide polymorphism data with allelic patterns ABBA and BABA, has seen considerable use. This statistic seeks to discern significant deviation from either a given species tree assumption, or from the balanced incomplete lineage sorting that could otherwise defy this species tree. However, while the D-statistic can successfully discriminate admixture from incomplete lineage sorting, it is not a simple matter to determine the directionality of admixture using only four-leaf tree models. As such, methods have been developed that use 5 leaves to evaluate admixture. Among these, the DFOIL method, which tests allelic patterns on the “symmetric” tree S = (((1,2),(3,4)),5), succeeds in finding admixture direction for many five-taxon examples. However, DFOIL does not make full use of all symmetry, nor can DFOIL function properly when ancient samples ...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
When multiple speciation events occur rapidly in succession, discordant genealogies due to incomplete lineage sorting (ILS) can complicate the detection of introgression. A variety of methods, including the D-statistic (a.k.a. the “ABBA–BABA test”), have been proposed to infer introgression in the presence of ILS for a four-taxon clade. However, no integrated method exists to detect introgression using allelic patterns for more complex phylogenies. Here we explore the issues associated with previous systems of applying D-statistics to a larger tree topology, and propose new DFOIL tests as an integrated framework to infer both the taxa involved in and the direction of introgression for a symmetric five-taxon phylogeny. Using theory and simulations, we show that the DFOIL statistics correctly identify the introgression donor and recipient lineages, even at low rates of introgression. DFOIL is also shown to have extremely low false-positive rates. The DFOIL tests are computationally inexpensive to calculate and can easily be applied to phylogenomic data sets, both genome-wide and in windows of the genome. In addition, we explore both the principles and problems of introgression detection in even more complex phylogenies.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Distribution of gene tree divergence times for all goose species. All distributions show a single peak, indicating gene flow during divergence. The divergence time of several gene trees was close to zero, suggesting low levels of recent gene flow between certain species. Final three figures represent the three subspecies of Brent Goose, which is depicted in the lower right panel. (ZIP 2715Â kb)
Facebook
TwitterPhylogenetic relationships among recently diverged species are often difficult to resolve due to insufficient phylogenetic signal in available markers and/or conflict among gene trees. Here we explore the use of reduced-representation genome sequencing, specifically in the form of restriction-site associated DNA (RAD), for phylogenetic inference and the detection of ancestral hybridization in non-model organisms. As a case study, we investigate Pedicularis section Cyathophora, a systematically recalcitrant clade of flowering plants in the broomrape family (Orobanchaceae). Two methods of phylogenetic inference, maximum likelihood and Bayesian concordance, were applied to data sets that included as many as 40,000 RAD loci. Both methods yielded similar topologies that included two major clades: a “rex-thamnophila†clade, composed of two species and several subspecies with relatively low floral diversity, and geographically widespread distributions at lower elevations, and a “superba†clade...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Estimates of historical effective population sizes for all goose species, based on a PSMC analysis. Final three figures represent the three subspecies of Brent Goose, which is depicted in the lower right panel. (ZIP 766Â kb)
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Background: Studying patterns of introgression can illuminate the role of hybridization in speciation, and help guide decisions relevant to the conservation of rare taxa. Vipera magnifica and Vipera orlovi are small vipers that have high conservation status due to their rarity and restricted distributions in an area of the Caucasus region where two other related species are present – V. kaznakovi and V. renardi. Despite numerous observations of hybridization between different species of small vipers, and the potential of a hybrid origin for V. magnifica and V. orlovi based on their distribution with respect to V. kaznakovi and V. renardi, hypotheses of a hybrid origin have not been formally tested. Here we generate genomic-scale data by performing next generation sequencing of double digest restriction-site associated DNA libraries, and use these multilocus data to test whether these two species are of hybrid origin. Results: We generated over nine hundred loci for 38 specimens of six taxa, and analysed the dataset using Bayesian clustering and multivariate methods, as well as Patterson D-statistics, which can distinguish between incomplete lineage sorting and introgression as explanations for shared polymorphism. The results demonstrate a pattern of historical admixture in the two purported hybrids that is consistent with past gene flow from V. renardi into V. kaznakovi. The average admixture proportion in individuals was low (6.39 %) in the case of V. magnifica, but was higher in V. orlovi (19.02 %). We also show that the specific individual samples used in D-statistic tests can have a significant impact on inferences regarding the magnitude of introgression, suggesting the importance of including multiple individuals in these analyses. Conclusions: Our results support the conclusion that both V. orlovi and V. magnifica had formed through a hybridization event between V. kaznakovi and V. renardi. Given a low proportion of admixture and absence of clear ecological and morphological differences V. magnifica should be treated as a marginal population of V. kaznakovi. Further studies that include analyses of ecological segregation of V. orlovi from parental taxa and search for evolutionary consequences of hybridisation would clarify if V. orlovi is a distinct hybrid species. Until this we recommend preserving the current taxonomy and protection status of V. orlovi.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABBA/BABA (D-statistics) and F4-ratio data generated by Dsuite's Dtrio function.
Facebook
TwitterZIP files contains all input files (dryad) and scripts (zenodo) needed to run a set of population genetic analyses (plus a README file). ABBA-BABA:Data and scripts used to perform the ABBA-BABA tests (D-statistic), including the following files:DATA:- data_files folder: VCF and individual info filesSCRIPTS:- ProcessData_Neodiprion_filterDPperind.sh: bash script to filter the VCF file based on DP and HWE and obtain a genotype matrix file- analysis_NeoLecPin_feb2017.r: R script to read the genotype matrix file and compute the ABBA-BABA tests (D-statistics) and compute their significance using block-jackniffe- RScripts folder: files with definition of functions used by the bash script ProcessData_Neodiprion_filterDPperind.sh and R script analysis_NeoLecPin_feb2017.rADMIXTURE:Data and scripts to run the admixture analysis for the hybrids, N. pinetum and N.lecontei in Kentucky, including the following files:DATA:- KY_Pin_F1_nohet_7x_0.5miss_0.01maf.recode.vcf is the vcf file...
Facebook
TwitterBackground: Porous species boundaries can be a source of conflicting hypotheses, particularly when coupled with variable data and/or methodological approaches. Their impacts can often be magnified when non-model organisms with complex histories of reticulation are investigated. One such example is the genus Catostomus (Osteichthys, Catostomidae), a freshwater fish clade with conflicting morphological and mitochondrial phylogenies. The former is hypothesized as reflecting the presence of admixed genotypes within morphologically distinct lineages, whereas the latter is interpreted as the presence of distinct morphologies that emerged multiple times through convergent evolution. We tested these hypotheses using multiple methods, to including multispecies coalescent and concatenated approaches. Patterson's D-statistic was applied to resolve potential discord, examine introgression, and test the putative hybrid origin of two species. We also applied naïve binning to explore potential effect...
Facebook
TwitterCraugastor_augusti_SNP_dataData included in zipfile: (1) STACKS commands used to process and assemble RADtags, (2) 0.20 missing biallelic data matrix [.stru format], (3) 0.25 missing biallelic data matrix [.stru format], (4) 0.50 missing biallelic data matrix [.stru format], (5) 0.10 missing fixed data matrix [.phy format], (6) 0.25 missing fixed data matrix [.phy format], and (7) 0.50 missing fixed data matrix [.phy format]Craugastor_augusti_data_Mol_Ecol.zipCraugastor_augusti_mtDNA_dataMitochondrial alignment (12S) in .phy format12S_mtDNA_alignment.phy
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of samples used with information about species, locality, date of collection, collector, and institution where collections are stored. (XLS 34 kb)
Facebook
TwitterHigh-throughput sequencing is helping biologists to overcome the difficulties of inferring the phylogenies of recently diverged taxa. The present study analyzes the phylogenetic signal of genomic regions with different inheritance patterns using genome skimming and ddRAD-seq in a species-rich Andean genus (Diplostephium) and its allies. We analyzed the complete nuclear ribosomal cistron, the complete chloroplast genome, a partial mitochondrial genome, and a nuclear-ddRAD matrix separately with phylogenetic methods. We applied several approaches to understand the causes of incongruence among datasets, including simulations and the detection of introgression using the D-statistic (ABBA-BABA test). We found significant incongruence among the nuclear, chloroplast, and mitochondrial phylogenies. The strong signal of hybridization found by simulations and the D-statistic among genera and inside the main clades of Diplostephium indicate reticulate evolution as a main cause of phylogenetic in...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
N, number of sequences; S, number of SNPs (excluding insertion-deletions); π, nucleotide diversity estimated in DnaSP; θ, most probable estimate in LAMARC; θ(95%), 95% credible intervals; all π and θ are ×103; D, Tajima's D-statistic; D*, Fu and Li's D-statistic; F*, Fu and Li's F-statistic; FST and ΦCT, genetic differentiation between wild boars and domestic pigs; all FST and ΦCT are ×102;*, P
Facebook
Twitterhttps://creativecommons.org/share-your-work/public-domain/pdmhttps://creativecommons.org/share-your-work/public-domain/pdm
This data set provides statistics about employer and nonemployer businesses from 2020 for the nation, states, and metropolitan statistical areas (MSA). It includes the number of firms, revenue, number of employees, and annual payroll, broken down by industry and owner demographics including as sex, ethnicity, race, and veteran status.About NES-DThe Nonemployer Statistics by Demographics series (NES-D) provides information on the demographic characteristics of nonemployer businesses. The NES-D is the result of a research project by the Census Bureau to complete the picture of U.S. business ownership by demographics for the United States. Historically, the quinquennial Survey of Business Owners (SBO) provided the only comprehensive source of information on both employer and nonemployer businesses by demographic characteristics of the business owners. In 2017, the SBO was replaced by the Annual Business Survey (ABS). The ABS is an annual survey that collects demographic characteristics from employer businesses. However, the ABS excludes the collection of demographic data from nonemployer businesses. The NES-D was developed to produce similar estimates as ABS on owner demographics for nonemployer businesses. The NES-D is not a survey; rather, it leverages existing individual-level administrative records to assign demographic characteristics to the universe of nonemployer businesses. Demographic characteristics including sex, ethnicity, race, veteran status, owner age, place of birth, and U.S. citizenship are assigned to nonemployer business owners.Together, the NES-D and the ABS will continue to provide the only source of detailed and comprehensive statistics on the scope, nature and activities of all U.S. businesses by the demographic characteristics of the business owners. NES-D data will be available annually by detailed geography and industry levels, receipt-size class, and legal form of organization (LFO). Beginning with the 2019 NES-D, the data will include urban and rural classification.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Gene flow can impede the evolution of reproductive isolating barriers between species. Reinforcement is the process by which pre-zygotic reproductive isolation evolves in sympatry due to selection to decrease costly hybridization. It is known that reinforcement can be prevented by too much gene flow, but we still do not know how often have pre-zygotic barriers evolved in the presence of gene flow or how much gene flow can occur during reinforcement. Flower color divergence in the native Texas wildflower, Phlox drummondii, is one of the best-studied cases of reinforcement. Here we use genomic analyses to infer gene flow between P. drummondii and a closely related sympatric species, P. cuspidata. We de novo assemble transcriptomes of four Phlox species to determine the phylogenetic relationships between these species, and find extensive discordance among gene tree topologies across genes. We find evidence of introgression between sympatric P. drummondii and P. cuspidata using the D-statistic, and use phylogenetic analyses to infer the predominant direction of introgression. We investigate geographic variation in gene flow by comparing the relative divergence of genes displaying discordant gene trees between an allopatric and sympatric sample. These analyses support the hypothesis that sympatric P. drummondii has experienced gene flow with P. cuspidata. We find that gene flow between these species is asymmetrical, which could explain why reinforcement caused divergence in only one of the sympatric species. Given the previous research in this system, we suggest strong selection can explain how reinforcement successfully evolved in this system despite gene flow in sympatry.
Facebook
TwitterPatterson’s D, also known as the ABBA-BABA statistic, and related statistics such as the f4-ratio, are commonly used to assess evidence of gene flow between populations or closely related species. Currently available implementations often require custom file formats, implement only small subsets of the available statistics, and are impractical to evaluate all gene flow hypotheses across datasets with many populations or species due to computational inefficiencies. Here we present a new software package Dsuite, an efficient implementation allowing genome scale calculations of the D and f4-ratio statistics across all combinations of tens or hundreds of populations or species directly from a variant call format (VCF) file. Our program also implements statistics suited for application to genomic windows, providing evidence of whether introgression is confined to specific loci and it can also aid in interpretation of a system of f4-ratio results with the use of the ‘f-branch’ method. Dsuite is available at https://github.com/millanek/Dsuite, is straightforward to use, substantially more computationally efficient than comparable programs, and provides a convenient suite of tools and statistics, including some not previously available in any software package. Thus, Dsuite facilitates the assessment of evidence for gene flow, especially across larger genomic datasets.