Patterson’s D, also known as the ABBA-BABA statistic, and related statistics such as the f4-ratio, are commonly used to assess evidence of gene flow between populations or closely related species. Currently available implementations often require custom file formats, implement only small subsets of the available statistics, and are impractical to evaluate all gene flow hypotheses across datasets with many populations or species due to computational inefficiencies. Here we present a new software package Dsuite, an efficient implementation allowing genome scale calculations of the D and f4-ratio statistics across all combinations of tens or hundreds of populations or species directly from a variant call format (VCF) file. Our program also implements statistics suited for application to genomic windows, providing evidence of whether introgression is confined to specific loci and it can also aid in interpretation of a system of f4-ratio results with the use of the ‘f-branch’ method. Dsuite ...
Detailed results from each dataset. Description: Input parameters, sensitivity, significance information and linear regression of the f-statistics, from all datasets in three simulation schemes. (XLSX 214 kb)
Several methods have been proposed to test for introgression across genomes. One method tests for a genome-wide excess of shared derived alleles between taxa using Patterson's D statistic, but does not establish which loci show such an excess or whether the excess is due to introgression or ancestral population structure. Several recent studies have extended the use of D by applying the statistic to small genomic regions, rather than genome-wide. Here, we use simulations and whole genome data from Heliconius butterflies to investigate the behavior of D in small genomic regions. We find that D is unreliable in this situation as it gives inflated values when effective population size is low, causing D outliers to cluster in genomic regions of reduced diversity. As an alternative, we propose a related statistic f̂d, a modified version of a statistic originally developed to estimate the genome-wide fraction of admixture. f̂d is not subject to the same biases as D, and is better at identifying introgressed loci. Finally, we show that both D and f̂d outliers tend to cluster in regions of low absolute divergence (dXY), which can confound a recently proposed test for differentiating introgression from shared ancestral variation at individual loci.
The role of interspecific hybridization has recently seen increasing attention, especially in the context of diversification dynamics. Genomic research has now made it abundantly clear that both hybridization and introgression – the exchange of genetic material through hybridization and backcrossing – are far more common than previously thought. Besides cases of ongoing or recent genetic exchange between taxa, an increasing number of studies report “ancient introgression†– referring to results of hybridization that took place in the distant past. However, it is not clear whether commonly used methods for the detection of introgression are applicable to such old systems, given that most of these methods were originally developed for analyses at the level of populations and recently diverged species, affected by recent or ongoing genetic exchange. In particular, the assumption of constant evolutionary rates, which is implicit in many commonly used approaches, is more likely to be violate..., Genomic datasets have been simulated with msprime, and processed with Dsuite, IQ-TREE, SNaQ, and QuIBL., , # Among-species rate variation produces false signals of introgression
Supplementary Material for Koppetsch et al. includes Supplementary Notes S1-S4, Supplementary Figures S1-S36, and Supplementary Tables S1-S5.
Supplementary Table 1 is provided in file Supplementary_Table_1.xlsx
, while all other Supplementary Material is included in file dstats_supplement.pdf
.
This Excel spreadsheet contains nested sheets that correspond to all the summary content that would be obtained after having run the script named run_all_simulation_data.sh
. All scripts are provided on the GitHub code repository (https://github.com/thorekop/ABBA-Site-Clustering/blob/main/src/). Please note, that the run_all_simulation_data.sh
script is not intended to be executed itself.
Abbreviations for the variables and parameters listed in Supplementary Table 1 are defined here:
When multiple speciation events occur rapidly in succession, discordant genealogies due to incomplete lineage sorting (ILS) can complicate the detection of introgression. A variety of methods, including the D-statistic (a.k.a. the “ABBA–BABA test†), have been proposed to infer introgression in the presence of ILS for a four-taxon clade. However, no integrated method exists to detect introgression using allelic patterns for more complex phylogenies. Here we explore the issues associated with previous systems of applying D-statistics to a larger tree topology, and propose new DFOIL tests as an integrated framework to infer both the taxa involved in and the direction of introgression for a symmetric five-taxon phylogeny. Using theory and simulations, we show that the DFOIL statistics correctly identify the introgression donor and recipient lineages, even at low rates of introgression. DFOIL is also shown to have extremely low false-positive rates. The DFOIL tests are computationally inexpe...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fs, Fu's Fs test statistic; D *, Fu and Li's D * test statistic; F*, Fu and Li's F* test statistic; Tajima's D: Tajima's Test statistic.
The evolutionary implications and frequency of hybridization and introgression are increasingly being recognized across the tree of life. To detect hybridization from multi-locus and genome-wide sequence data, a popular class of methods is based on summary statistics from subsets of 3 or 4 taxa. However, these methods often carry the assumption of a constant substitution rate across lineages and genes, which is commonly violated in many groups. In this work, we quantify the effects of rate variation on the D test (also known as ABBA-BABA test), the D3 test, and HyDe. All three tests are used widely across a range of taxonomic groups, in part because they are very fast to compute. We consider rate variation across species lineages, across genes, their lineage-by-gene interaction, and residual variation across gene-tree edges. We do so by simulating gene trees within species networks according to a birth-death-hybridization process so as to capture a range of realistic species phylogenies...
The CMS Program Statistics - Medicare Part D tables provide use and Part D drug costs by type of Part D plan (stand-alone prescription drug plan and Medicare Advantage prescription drug plan). For additional information on enrollment, providers, and Medicare use and payment, visit the CMS Program Statistics page. These data do not exist in a machine-readable format, so the view data and API options are not available. Please use the download function to access the data. Below is the list of tables: MDCR UTLZN D 1. Medicare Part D Utilization: Average Annual Prescription Drug Fills by Type of Plan, Low Income Subsidy (LIS) Eligibility, and Generic Dispensing Rate, Yearly Trend MDCR UTLZN D 2. Medicare Part D Utilization: Average Annual Gross Drug Costs Per Part D Enrollee, by Type of Plan, Low Income Subsidy (LIS) Eligibility, and Brand/Generic Drug Classification, Yearly Trend MDCR UTLZN D 3. Medicare Part D Utilization: Average Annual Gross Drug Costs Per Part D Enrollee, by Type of Plan, Low Income Subsidy (LIS) Eligibility, and Brand/Generic Drug Classification, Yearly Trend MDCR UTLZN D 4. Medicare Part D Utilization: Average Annual Prescription Drug Fills and Average Annual Gross Drug Cost Per Part D Enrollee, by Type of Plan and Demographic Characteristics MDCR UTLZN D 5. Medicare Part D Utilization: Average Annual Prescription Drug Fills and Average Annual Gross Drug Cost Per Part D Utilizer, by Type of Plan and Demographic Characteristics MDCR UTLZN D 6. Medicare Part D Utilization: Average Annual Prescription Drug Fills and Average Annual Gross Drug Cost Per Part D Enrollee, by Type of Plan, by Area of Residence MDCR UTLZN D 7. Medicare Part D Utilization: Average Annual Prescription Drug Fills and Average Annual Gross Drug Cost Per Part D Utilizer, by Type of Plan, by Area of Residence MDCR UTLZN D 8. Medicare Part D Utilization: Number of Part D Utilizers and Average Annual Prescription Drug Fills by Type of Part D Plan, Low Income Subsidy (LIS) Eligibility, and Part D Coverage Phase, Yearly Trend MDCR UTLZN D 9. Medicare Part D Utilization: Number of Part D Utilizers and Drug Costs by Type of Part D Plan, Low Income Subsidy (LIS) Eligibility, and Part D Coverage Phase, Yearly Trend MDCR UTLZN D 10. Medicare Part D Utilization: Number of Part D Utilizers, Average Annual Prescription Drug Events (Fills) and Average Annual Gross Drug Cost Per Part D Utilizer, by Part D Coverage Phase and Demographic Characteristics MDCR UTLZN D 11. Medicare Part D Utilization: Number of Part D Utilizers, Average Annual Prescription Drug Fills and Average Annual Gross Drug Cost Per Part D Utilizer, by Part D Coverage Phase and Area of Residence
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Distribution of gene tree divergence times for all goose species. All distributions show a single peak, indicating gene flow during divergence. The divergence time of several gene trees was close to zero, suggesting low levels of recent gene flow between certain species. Final three figures represent the three subspecies of Brent Goose, which is depicted in the lower right panel. (ZIP 2715Â kb)
Over the past 15 years, the D-statistic, a four-taxon test for organismal admixture (hybridization, or introgression) which incorporates single nucleotide polymorphism data with allelic patterns ABBA and BABA, has seen considerable use. This statistic seeks to discern significant deviation from either a given species tree assumption, or from the balanced incomplete lineage sorting that could otherwise defy this species tree. However, while the D-statistic can successfully discriminate admixture from incomplete lineage sorting, it is not a simple matter to determine the directionality of admixture using only four-leaf tree models. As such, methods have been developed that use 5 leaves to evaluate admixture. Among these, the DFOIL method, which tests allelic patterns on the “symmetric” tree S = (((1,2),(3,4)),5), succeeds in finding admixture direction for many five-taxon examples. However, DFOIL does not make full use of all symmetry, nor can DFOIL function properly when ancient samples ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABBA/BABA (D-statistics) and F4-ratio data generated by Dsuite's Dtrio function.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Introgression is now commonly reported in studies across the Tree of Life, aided by recent advancements in data collection and analysis. Nevertheless, researchers working with non‐model species lacking reference genomes may be stymied by a mismatch between available resources and methodological demands. In this study, we demonstrate a fast and simple approach for inferring introgression using RADseq data, and apply it to a case study involving spiny lizards (Sceloporus) from northeastern México. First, we find evidence for recurrent mtDNA introgression between the two focal species based on patterns of mito‐nuclear discordance. We then test for nuclear introgression by exhaustively applying the "five‐taxon" D‐statistic (DFOIL) to all relevant individuals sampled for RADseq data. In our case, this exhaustive approach (dubbed "ExDFOIL") entails testing up to ~250,000 unique four‐taxon combinations of individuals across species. To facilitate use of this ExDFOIL approach, we provide scripts for many relevant tasks, including the selection of appropriate four‐taxon combinations, execution of DFOIL tests in parallel, and visualization of introgression results in phylogenetic and geographic space. Using ExDFOIL, we find evidence for ancient introgression between the focal species. Furthermore, we reveal geographic variation in patterns of introgression that is consistent with patterns of mito‐nuclear discordance and with recurrent introgression. Overall, our study demonstrates that the combination of DFOIL and RADseq data can effectively detect introgression under a variety of sampling conditions (for individuals, populations, and loci). Importantly, we also find evidence that batch‐specific error and linkage in RADseq data may mislead inferences of introgression under certain conditions.
https://creativecommons.org/share-your-work/public-domain/pdmhttps://creativecommons.org/share-your-work/public-domain/pdm
This data set provides statistics about employer and nonemployer businesses from 2020 for the nation, states, and metropolitan statistical areas (MSA). It includes the number of firms, revenue, number of employees, and annual payroll, broken down by industry and owner demographics including as sex, ethnicity, race, and veteran status.About NES-DThe Nonemployer Statistics by Demographics series (NES-D) provides information on the demographic characteristics of nonemployer businesses. The NES-D is the result of a research project by the Census Bureau to complete the picture of U.S. business ownership by demographics for the United States. Historically, the quinquennial Survey of Business Owners (SBO) provided the only comprehensive source of information on both employer and nonemployer businesses by demographic characteristics of the business owners. In 2017, the SBO was replaced by the Annual Business Survey (ABS). The ABS is an annual survey that collects demographic characteristics from employer businesses. However, the ABS excludes the collection of demographic data from nonemployer businesses. The NES-D was developed to produce similar estimates as ABS on owner demographics for nonemployer businesses. The NES-D is not a survey; rather, it leverages existing individual-level administrative records to assign demographic characteristics to the universe of nonemployer businesses. Demographic characteristics including sex, ethnicity, race, veteran status, owner age, place of birth, and U.S. citizenship are assigned to nonemployer business owners.Together, the NES-D and the ABS will continue to provide the only source of detailed and comprehensive statistics on the scope, nature and activities of all U.S. businesses by the demographic characteristics of the business owners. NES-D data will be available annually by detailed geography and industry levels, receipt-size class, and legal form of organization (LFO). Beginning with the 2019 NES-D, the data will include urban and rural classification.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Background: Studying patterns of introgression can illuminate the role of hybridization in speciation, and help guide decisions relevant to the conservation of rare taxa. Vipera magnifica and Vipera orlovi are small vipers that have high conservation status due to their rarity and restricted distributions in an area of the Caucasus region where two other related species are present – V. kaznakovi and V. renardi. Despite numerous observations of hybridization between different species of small vipers, and the potential of a hybrid origin for V. magnifica and V. orlovi based on their distribution with respect to V. kaznakovi and V. renardi, hypotheses of a hybrid origin have not been formally tested. Here we generate genomic-scale data by performing next generation sequencing of double digest restriction-site associated DNA libraries, and use these multilocus data to test whether these two species are of hybrid origin. Results: We generated over nine hundred loci for 38 specimens of six taxa, and analysed the dataset using Bayesian clustering and multivariate methods, as well as Patterson D-statistics, which can distinguish between incomplete lineage sorting and introgression as explanations for shared polymorphism. The results demonstrate a pattern of historical admixture in the two purported hybrids that is consistent with past gene flow from V. renardi into V. kaznakovi. The average admixture proportion in individuals was low (6.39 %) in the case of V. magnifica, but was higher in V. orlovi (19.02 %). We also show that the specific individual samples used in D-statistic tests can have a significant impact on inferences regarding the magnitude of introgression, suggesting the importance of including multiple individuals in these analyses. Conclusions: Our results support the conclusion that both V. orlovi and V. magnifica had formed through a hybridization event between V. kaznakovi and V. renardi. Given a low proportion of admixture and absence of clear ecological and morphological differences V. magnifica should be treated as a marginal population of V. kaznakovi. Further studies that include analyses of ecological segregation of V. orlovi from parental taxa and search for evolutionary consequences of hybridisation would clarify if V. orlovi is a distinct hybrid species. Until this we recommend preserving the current taxonomy and protection status of V. orlovi.
In 2024, there were approximately ** million beneficiaries enrolled in the Medicare Part D plan in the United States. This statistic illustrates the total number of beneficiaries enrolled in the Medicare Part D plan in the United States from 2006 to 2024.
In 2024, **** percent of Medicare's Part D beneficiaries were insured through United Healthcare. Part D covers prescription drugs and must be separately enrolled for beneficiaries in traditional Medicare plans in the United States. This statistic shows the distribution of Medicare Part D enrollment in 2024, by firm.
Phylogenetic relationships among recently diverged species are often difficult to resolve due to insufficient phylogenetic signal in available markers and/or conflict among gene trees. Here we explore the use of reduced-representation genome sequencing, specifically in the form of restriction-site associated DNA (RAD), for phylogenetic inference and the detection of ancestral hybridization in non-model organisms. As a case study, we investigate Pedicularis section Cyathophora, a systematically recalcitrant clade of flowering plants in the broomrape family (Orobanchaceae). Two methods of phylogenetic inference, maximum likelihood and Bayesian concordance, were applied to data sets that included as many as 40,000 RAD loci. Both methods yielded similar topologies that included two major clades: a “rex-thamnophila†clade, composed of two species and several subspecies with relatively low floral diversity, and geographically widespread distributions at lower elevations, and a “superba†clade...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of samples used with information about species, locality, date of collection, collector, and institution where collections are stored. (XLS 34 kb)
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Despite being dominant elements of understory communities in the coniferous forests of western North America, phylogenetic relationships among bilberries (Vaccinium section Myrtillus) remain unresolved. Morphological delimitation among most western bilberry species is tenuous and traditionally employed molecular sources of phylogenetic information have yielded insufficient variability. Moreover, these species are hypothesized to have undergone extensive introgression. We used RADseq data analyzed under maximum likelihood species tree estimation to examine the influence of introgression on relationships among Vaccinium myrtillus, V. scoparium, and V. caespitosum. Additionally, we used these data to assess whether the populations of V. myrtillus disjunct between North America and Eurasia are monophyletic and should continue to be recognized as conspecific. Significant genome-wide introgression, as determined through D-statistic analyses, was detected between North American samples of V. myrtillus and V. caespitosum, and to a lesser extent, between V. myrtillus and V. scoparium. No significant D-values were detected between V. scoparium and V. caespitosum. Accessions of Vaccinium myrtillus from Eurasia and North America were recovered as non-monophyletic, prompting our proposed resurrection of V. oreophilum for North American material. The long-assumed sister species relationship between V. oreophilum and V. scoparium was not recovered in our analysis. Instead, V. oreophilum and V. caespitosum were inferred to be sister taxa. This study reveals considerable introgression detectable in the evolutionary history of western North American bilberries and demonstrates the utility of RADseq data to resolve species level relationships in groups that undergo reticulate evolution such as Vaccinium.
The Nonemployer Statistics by Demographics (NES-D): Company Summary estimates provide economic data classified by sex, ethnicity, race, and veteran status of nonemployer firms. The NES-D is not a survey; rather, it leverages existing administrative records to assign demographic characteristics to the universe of nonemployer businesses. The nonemployer universe is comprised of businesses with no paid employment or payroll, annual receipts of $1,000 or more ($1 or more in the construction industries), and filing IRS tax forms for sole proprietorships (Form 1040, Schedule C), partnerships (Form 1065), or corporations (the Form 1120 series). Data for all firms are also presented. These estimates are produced by combining estimates for nonemployer firms from the Nonemployer Statistics by Demographics (NESD) and employer firms from the Annual Business Survey (ABS).
Patterson’s D, also known as the ABBA-BABA statistic, and related statistics such as the f4-ratio, are commonly used to assess evidence of gene flow between populations or closely related species. Currently available implementations often require custom file formats, implement only small subsets of the available statistics, and are impractical to evaluate all gene flow hypotheses across datasets with many populations or species due to computational inefficiencies. Here we present a new software package Dsuite, an efficient implementation allowing genome scale calculations of the D and f4-ratio statistics across all combinations of tens or hundreds of populations or species directly from a variant call format (VCF) file. Our program also implements statistics suited for application to genomic windows, providing evidence of whether introgression is confined to specific loci and it can also aid in interpretation of a system of f4-ratio results with the use of the ‘f-branch’ method. Dsuite ...