Patterson’s D, also known as the ABBA-BABA statistic, and related statistics such as the f4-ratio, are commonly used to assess evidence of gene flow between populations or closely related species. Currently available implementations often require custom file formats, implement only small subsets of the available statistics, and are impractical to evaluate all gene flow hypotheses across datasets with many populations or species due to computational inefficiencies. Here we present a new software package Dsuite, an efficient implementation allowing genome scale calculations of the D and f4-ratio statistics across all combinations of tens or hundreds of populations or species directly from a variant call format (VCF) file. Our program also implements statistics suited for application to genomic windows, providing evidence of whether introgression is confined to specific loci and it can also aid in interpretation of a system of f4-ratio results with the use of the ‘f-branch’ method. Dsuite is available at https://github.com/millanek/Dsuite, is straightforward to use, substantially more computationally efficient than comparable programs, and provides a convenient suite of tools and statistics, including some not previously available in any software package. Thus, Dsuite facilitates the assessment of evidence for gene flow, especially across larger genomic datasets.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The role of interspecific hybridization in the context of diversification dynamics has recently seen increasing attention. Genomic research has now made it abundantly clear that both hybridization and introgression - the exchange of genetic material through hybridization - are far more common than previously thought. Moreover, even highly divergent species were found to hybridize and backcross. These findings raise the question whether commonly used methods for the detection of introgression are applicable to such divergent hybridizing species, given that most of these methods were originally developed for analyses at the level of populations and recently diverged species. In particular, the assumption of constant evolutionary rates, which is implicit in many commonly used approaches, is more likely to be violated as evolutionary divergence increases. To test the limitations of introgression detection methods when being applied to divergent species, we simulated thousands of genomic datasets under a wide range of settings, with varying degrees of among-species rate variation and introgression. Using these simulated datasets, we were able to show that commonly applied statistical methods, including the D-statistic and tests based on sets of phylogenetic trees, produce false-positive signals of introgression between highly divergent taxa when these have different rates of evolution. These misleading signals are caused by the presence of homoplasies that occur at different rates when rate variation is present. To distinguish between the patterns caused by rate variation and genuine introgression, we developed a new test that is based on the expected clustering of introgressed sites and implemented this test in the program Dsuite.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Input and output files for analyses of divergence times of Gadinae with StarBEAST2
gadinae_starbeast_alignments.tgz gadinae_starbeast.xml gadinae_starbeast.log gadinae_starbeast.trees gadinae_starbeast.tre
Input and output files for analyses of divergence times and introgression among Gadus, Arctogadus, and Boreogadus with AIM
gadus_arctogadus_boreogadus_aim_alignments.tgz gadus_arctogadus_boreogadus_aim.xml gadus_arctogadus_boreogadus_aim.log gadus_arctogadus_boreogadus_aim.trees gadus_arctogadus_boreogadus_aim_bpp0763.tre gadus_arctogadus_boreogadus_aim_bpp0234.tre
Dataset used for analyses of introgression among Gadus, Arctogadus, and Boreogadus with Dsuite
gadus_arctogadus_boreogadus_dsuite.vcf.gz
Alignments and output trees for analyses of tree-based signals of introgression among Gadus, Arctogadus, and Boreogadus with IQ-TREE
gadus_arctogadus_boreogadus_iqtree_alignments.tgz gadus_arctogadus_boreogadus_iqtree_trees.tgz
Datasets used for supergene-specific analyses of linkage disequilibrium in Gadus morhua with PLINK
gadus_morhua_supergene_lg01_plink.map gadus_morhua_supergene_lg01_plink.ped
gadus_morhua_supergene_lg02_plink.map gadus_morhua_supergene_lg02_plink.ped
gadus_morhua_supergene_lg07_plink.map gadus_morhua_supergene_lg07_plink.ped
gadus_morhua_supergene_lg12_plink.map gadus_morhua_supergene_lg12_plink.ped
Input and output files for analyses of divergence times among Gadus morhua populations with SNAPP
gadus_morhua_outside_supergenes_snapp.vcf.gz gadus_morhua_outside_supergenes_snapp.xml gadus_morhua_outside_supergenes_snapp.log gadus_morhua_outside_supergenes_snapp.tre gadus_morhua_outside_supergenes_snapp.trees
Dataset used for analyses of introgression among Gadus morhua populations with Dsuite
gadus_morhua_outside_supergenes_dsuite.vcf.gz
Dataset used for analyses of demograpy of Gadus morhua populations with Relate
gadus_morhua_outside_supergenes_relate.vcf.gz
Input and output files for supergene-specific analyses of divergence times among Gadus morhua populations with SNAPP
gadus_morhua_supergene_lg01_snapp.vcf.gz gadus_morhua_supergene_lg01_snapp.xml gadus_morhua_supergene_lg01_snapp.log gadus_morhua_supergene_lg01_snapp.trees gadus_morhua_supergene_lg01_snapp.tre
gadus_morhua_supergene_lg02_snapp.vcf.gz gadus_morhua_supergene_lg02_snapp.xml gadus_morhua_supergene_lg02_snapp.log gadus_morhua_supergene_lg02_snapp.trees gadus_morhua_supergene_lg02_snapp.tre
gadus_morhua_supergene_lg07_snapp.vcf.gz gadus_morhua_supergene_lg07_snapp.xml gadus_morhua_supergene_lg07_snapp.log gadus_morhua_supergene_lg07_snapp.trees gadus_morhua_supergene_lg07_snapp.tre
gadus_morhua_supergene_lg12_snapp.vcf.gz gadus_morhua_supergene_lg12_snapp.xml gadus_morhua_supergene_lg12_snapp.log gadus_morhua_supergene_lg12_snapp.trees gadus_morhua_supergene_lg12_snapp.tre
Datasets used for supergene-specific analyses of introgression among Gadus morhua populations with Dsuite
gadus_morhua_supergene_lg01_dsuite.vcf.gz gadus_morhua_supergene_lg02_dsuite.vcf.gz gadus_morhua_supergene_lg07_dsuite.vcf.gz gadus_morhua_supergene_lg12_dsuite.vcf.gz
Datasets used for supergene-specific analyses of demograpy of Gadus morhua populations with Relate
gadus_morhua_supergene_lg01_relate.vcf.gz gadus_morhua_supergene_lg02_relate.vcf.gz gadus_morhua_supergene_lg07_relate.vcf.gz gadus_morhua_supergene_lg12_relate.vcf.gz
Different genomic regions may reflect conflicting phylogenetic topologies on account of incomplete lineage sorting and/or gene flow. Genomic data are necessary to reconstruct the true species tree and explore potential causes of phylogenetic conflict. Here, we investigate the phylogenetic relationships of four Emberiza species (Aves: Emberizidae) and discuss the potential causes of the observed mitochondrial non-monophyly of Emberiza godlewskii (Godlewski's bunting) using phylogenomic analyses based on whole genome resequencing data from 41 birds. Phylogenetic analyses based on both the whole mitochondrial genome and ~39 kilobases from the non-recombining W chromosome reveal that the northern and southern populations of E. godlewskii are each sister to E. cioides and E. cia, respectively. In contrast, phylogenetic analysis based on genome-wide data support the monophyly of E. godlewskii with the following tree topology: (((E. godlewskii, E. cia), E. cioides), E. jankowskii). Using D-sta..., Dataset and supporting information from the manuscript of Phylogenetic conflict between species tree and maternally inherited gene trees in a clade of Emberiza buntings (Aves: Emberizidae)., , These files contains original fasta or vcf file for phylogenetic analyses in the manuscript USYB-2023-045
File "Concatenated_NRW.fas" contains aligned NRW sequences; File "Dsuite_output_data.txt" contains output data of Dsuite analysis; huimei, liban, sandaomei, south and north represent E. cia, E. jankowskii, E. cioides, southern E. godlewskii and northern E. godlewskii, respectively; BBAA, ABBA and BABA represents the counts conform to the corresponding site patterns; for other parameters, please refer to Dsuite tutorial; File "ldfil.recode.vcf.gz" contains linkage disequilibrium pruned vcf file; File "mitochondrial_cytb.fas" contains aligned mitochondrial cytb sequences; File "mitochondrial_other_sequences.fas" contains aligned mitochondrial sequences without cytb gene File "MP-EST_input_20000 trees.rar" contains 20000 sliding window trees for the input to MP-EST analysis; File "ALSTRAL_input_trees_of_Z_chromosome.trees" contains 3583 sliding window trees from chromosome Z for the i...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Patterson’s D, also known as the ABBA-BABA statistic, and related statistics such as the f4-ratio, are commonly used to assess evidence of gene flow between populations or closely related species. Currently available implementations often require custom file formats, implement only small subsets of the available statistics, and are impractical to evaluate all gene flow hypotheses across datasets with many populations or species due to computational inefficiencies. Here we present a new software package Dsuite, an efficient implementation allowing genome scale calculations of the D and f4-ratio statistics across all combinations of tens or hundreds of populations or species directly from a variant call format (VCF) file. Our program also implements statistics suited for application to genomic windows, providing evidence of whether introgression is confined to specific loci and it can also aid in interpretation of a system of f4-ratio results with the use of the ‘f-branch’ method. Dsuite is available at https://github.com/millanek/Dsuite, is straightforward to use, substantially more computationally efficient than comparable programs, and provides a convenient suite of tools and statistics, including some not previously available in any software package. Thus, Dsuite facilitates the assessment of evidence for gene flow, especially across larger genomic datasets.