Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets w/ simulated error derived from Datasets 1-5. There are 1000 datasets per combination of error level and skewness type.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The second parity rule states that, if there is no bias in mutation or selection, then within each strand of DNA complementary bases are present at approximately equal frequencies. In bacteria, however, there is commonly an excess of G (over C) and, to a lesser extent, T (over A) in the replicatory leading strand. The low G+C Firmicutes, such as Staphylococcus aureus, are unusual in displaying an excess of A over T on the leading strand. As mutation has been established as a major force in the generation of such skews across various bacterial taxa, this anomaly has been assumed to reflect unusual mutation biases in Firmicute genomes. Here we show that this is not the case and that mutation bias does not explain the atypical AT skew seen in S. aureus. First, recently arisen intergenic SNPs predict the classical replication-derived equilibrium enrichment of T relative to A, contrary to what is observed. Second, sites predicted to be under weak purifying selection display only weak AT skew. Third, AT skew is primarily associated with largely non-synonymous first and second codon sites and is seen with respect to their sense direction, not which replicating strand they lie on. The atypical AT skew we show to be a consequence of the strong bias for genes to be co-oriented with the replicating fork, coupled with the selective avoidance of both stop codons and costly amino acids, which tend to have T-rich codons. That intergenic sequence has more A than T, while at mutational equilibrium a preponderance of T is expected, points to a possible further unresolved selective source of skew.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This dataset contains site information, basin characteristics, results of flood-frequency analysis, and a generalized (regional) flood skew for 76 selected streamgages operated by the U.S. Geological Survey (USGS) in the upper White River basin (4-digit hydrologic unit 1101) in southern Missouri and northern Arkansas. The Little Rock District U.S. Army Corps of Engineers (USACE) needed updated estimates of streamflows corresponding to selected annual exceedance probabilities (AEPs) and a basin-specific regional flood skew. USGS selected 111 candidate streamgages in the study area that had 20 or more years of gaged annual peak-flow data available through the 2020 water year. After screening for regulation, urbanization, redundant/nested basins, drainage areas greater than 2,500 square miles, and streamgage basins located in the Mississippi Alluvial Plain (8-digit hydrologic unit 11010013), 77 candidate streamgages remained. After conducting the initial flood-frequency analysis ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical variation introduced in the course of conducting experimental work, but standard normalization techniques are not capable of eliminating technical bias in cases where the distribution of the truly altered variables is skewed, i.e. when a large fraction of the variables are either positively or negatively affected by the treatment. However, several experiments are likely to generate such skewed distributions, including ChIP-chip experiments for the study of chromatin, gene expression experiments for the study of apoptosis, and SNP-studies of copy number variation in normal and tumour tissues. A preliminary study using spike-in array data established that the capacity of an experiment to identify altered variables and generate unbiased estimates of the fold change decreases as the fraction of altered variables and the skewness increases. We propose the following work-flow for analyzing high-dimensional experiments with regions of altered variables: (1) Pre-process raw data using one of the standard normalization techniques. (2) Investigate if the distribution of the altered variables is skewed. (3) If the distribution is not believed to be skewed, no additional normalization is needed. Otherwise, re-normalize the data using a novel HMM-assisted normalization procedure. (4) Perform downstream analysis. Here, ChIP-chip data and simulated data were used to evaluate the performance of the work-flow. It was found that skewed distributions can be detected by using the novel DSE-test (Detection of Skewed Experiments). Furthermore, applying the HMM-assisted normalization to experiments where the distribution of the truly altered variables is skewed results in considerably higher sensitivity and lower bias than can be attained using standard and invariant normalization methods.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project examines whether people have an intrinsic preference for negatively skewed or positively skewed information structures and how these preferences relate to intrinsic preferences for informativeness. It reports results from 5 studies (3 lab experiments, 2 online studies).
"NewEngland_pkflows.PRT" is a text file that contains results of flood-frequency analysis of annual peak flows from 186 selected streamflow gaging stations (streamgages) operated by the U.S. Geological Survey (USGS) in the New England region (Maine, Connecticut, Massachusetts, Rhode Island, New York, New Hampshire, and Vermont). Only streamgages in the region that were also in the USGS "GAGES II" database (https://water.usgs.gov/GIS/metadata/usgswrd/XML/gagesII_Sept2011.xml) were considered for use in the study. The file was generated by combining PeakFQ output (.PRT) files created using version 7.0 of USGS software PeakFQ (https://water.usgs.gov/software/PeakFQ/; Veilleux and others, 2014) to conduct flood-frequency analyses using the Expected Moments Algorithm (England and others, 2018). The peak-flow files used as input to PeakFQ were obtained from the USGS National Water Information System (NWIS) database (https://nwis.waterdata.usgs.gov/usa/nwis/peak) and contained annual peak flows ending in water year 2011. Results of the flood-frequency analyses were used to estimate skewness of annual peak flows in the New England region using Bayesian Weighted Least Squares / Bayesian Generalized Least Squares (B-WLS / B-GLS) regression (Veilleux and others, 2019).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This section presents a discussion of the research data. The data was received as secondary data however, it was originally collected using the time study techniques. Data validation is a crucial step in the data analysis process to ensure that the data is accurate, complete, and reliable. Descriptive statistics was used to validate the data. The mean, mode, standard deviation, variance and range determined provides a summary of the data distribution and assists in identifying outliers or unusual patterns. The data presented in the dataset show the measures of central tendency which includes the mean, median and the mode. The mean signifies the average value of each of the factors presented in the tables. This is the balance point of the dataset, the typical value and behaviour of the dataset. The median is the middle value of the dataset for each of the factors presented. This is the point where the dataset is divided into two parts, half of the values lie below this value and the other half lie above this value. This is important for skewed distributions. The mode shows the most common value in the dataset. It was used to describe the most typical observation. These values are important as they describe the central value around which the data is distributed. The mean, mode and median give an indication of a skewed distribution as they are not similar nor are they close to one another. In the dataset, the results and discussion of the results is also presented. This section focuses on the customisation of the DMAIC (Define, Measure, Analyse, Improve, Control) framework to address the specific concerns outlined in the problem statement. To gain a comprehensive understanding of the current process, value stream mapping was employed, which is further enhanced by measuring the factors that contribute to inefficiencies. These factors are then analysed and ranked based on their impact, utilising factor analysis. To mitigate the impact of the most influential factor on project inefficiencies, a solution is proposed using the EOQ (Economic Order Quantity) model. The implementation of the 'CiteOps' software facilitates improved scheduling, monitoring, and task delegation in the construction project through digitalisation. Furthermore, project progress and efficiency are monitored remotely and in real time. In summary, the DMAIC framework was tailored to suit the requirements of the specific project, incorporating techniques from inventory management, project management, and statistics to effectively minimise inefficiencies within the construction project.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We generate 10 model trees for a given number of genomes (). The number of false positives (FP), the number of false negatives (FN), and the execution time (time) in a cell are the average of the finished computations (finished: the number of finished computations within 24 hours) out of 10 trials using 10 different model trees. , , and in the tables are hours, minutes, and seconds, respectively. is the number of genes in a genome, which is 100 in our experiments.
This dataset contains annual peak-flow data, PeakFQ specifications, and results of flood-frequency analyses of annual peak flows for 368 selected streamflow gaging stations (streamgages) operated by the U.S. Geological Survey (USGS) in the Great Lakes and Ohio River basins. "PeakFQinput_all.txt" contains annual peak-flow data, ending in water year 2013, for all 368 streamgages in the study area. Annual peak-flow data were obtained from the USGS National Water Information System (NWIS) database (https://nwis.waterdata.usgs.gov/usa/nwis/peak). "PeakFQspec_all.psf" contains PeakFQ specifications for all 368 streamgages in the study area. The specifications were developed by hydrologists in the various USGS Water Science Centers that participated in the study. "PeakFQoutput_all.PRT" contains the results of flood-frequency analyses of annual peak-flow data, for each of the 368 streamgages in the study area, that were conducted using the Expected Moments Algorithm (England and others, 2018). Using the annual peak-flow data in "PeakFQinput_all.txt" and the specifications in "PeakFQspec_all.psf", "PeakFQoutput_all.PRT" was generated in version 7.2 of USGS flood-frequency analysis software PeakFQ (https://water.usgs.gov/software/PeakFQ/; Veilleux and others, 2014). Results of the flood-frequency analyses were used to estimate regional skew for the study area using Bayesian Weighted Least Squares / Bayesian Generalized Least Squares (B-WLS / B-GLS) regression.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Quantitative-genetic models of differentiation under migration-selection balance often rely on the assumption of normally distributed genotypic and phenotypic values. When a population is subdivided into demes with selection toward different local optima, migration between demes may result in asymmetric, or skewed, local distributions. Using a simplified two-habitat model, we derive formulas without a priori assuming a Gaussian distribution of genotypic values, and we find expressions that naturally incorporate higher moments, such as skew. These formulas yield predictions of the expected divergence under migration-selection balance that are more accurate than models assuming Gaussian distributions, which illustrates the importance of incorporating these higher moments to assess the response to selection in heterogeneous environments. We further show with simulations that traits with loci of large effect display the largest skew in their distribution at migration-selection balance.
This dataset contains site information, results of at-site flood-frequency analysis, and results of Bayesian weighted least-squares/Bayesian generalized least-squares (B-WLS/B-GLS) analysis of regional skewness of the annual peak flows for 405 streamflow gaging stations (streamgages) operated by the U.S. Geological Survey (USGS) in parts of the Mid-Atlantic and South-Atlantic-Gulf regions (hydrologic unit codes 02 and 03) in eastern Virginia, eastern West Virginia, and western Maryland. With few exceptions, annual peak-flow data through the 2021 water year (A water year is defined as the period October 1-September 30, named for the year in which it ends) were used in the study. For regional skew analysis, 234 of the 405 candidate streamgages were removed for pseudo record lengths (PRL; Veilleux and Wagner, 2021) less than 50 years, 50 were removed for redundancy, and 9 were removed for regulation, urbanization, or being outside the study area (see file "VAskew_Region1.csv" in this dataset). For the remaining 112 of 405 candidate streamgages, B-WLS/B-GLS regression (Veilleux and Wagner, 2021) was used to relate flood skew to a suite of 32 explanatory variables (see file "BasinCharDefinitions.csv"). None of the explanatory variables tested had sufficient predictive power in explaining the variability in skew in the region; thus, a constant model of regional skew, 0.50 (average variance of prediction 0.33, standard error 0.574) was selected for the study area (Messinger and others, 2025). For the 405 candidate streamgages, annual peak-flow data through the 2021 water year ("VAskew_region1.pkf") and specification ("VAskew_region1.psf"), output ("VASKEW_REGION1.PRT"), and export ("VASKEW_REGION1.EXP") files from flood-frequency analysis in version 7.4.1 of USGS PeakFQ software (hereafter referred to as "PeakFQ"; Veilleux and others, 2014; Flynn and others, 2006) are provided. Two .csv files are provided, one describing the basin characteristics tested ("BasinCharsTested.csv") and the other ("VAskew_Region1.csv") containing site information (U.S. Geological Survey, 2023), results of flood-frequency analysis in PeakFQ, and, for the 112 streamgages used in the B-WLS/B-GLS regression, PRL, unbiased at-site skew, the B-WLS/B-GLS residual and metrics of leverage and influence. A geographic information systems (GIS) shapefile (.shp) containing a polygon representing the geographic extent of the skew region is also included.
Simulations scripts and results for both cases of skewness in environmental effects and skewness in breeding values distribution, and scripts used for generate the figure 1 and S1.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The effect of changing from a normal to a skew-normal distribution on the random-effects on the regression coefficients, with 95% credible intervals, in the asymptote model.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The partitioning of reproduction among individuals in communally breeding animals varies greatly among species, from the monopolization of reproduction (high reproductive skew) to similar contribution to the offspring in others (low skew). Reproductive skew models explain how relatedness or ecological constraints affect the magnitude of reproductive skew. They typically assume that individuals are capable of flexibly reacting to social and environmental changes. Most models predict a decrease of skew when benefits of staying in the group are reduced. In the ant Leptothorax acervorum, queens in colonies from marginal habitats form dominance hierarchies and only the top-ranking queen lays eggs ("functional monogyny"). In contrast, queens in colonies from extended coniferous forests throughout the Palaearctic rarely interact aggressively and all lay eggs ("polygyny"). An experimental increase of queen:worker ratios in colonies from low-skew populations elicits queen–queen aggression similar to that in functionally monogynous populations. Here, we show that this manipulation also results in increased reproductive inequalities among queens. Queens from natural overwintering colonies differed in the number of developing oocytes in their ovaries. These differences were greatly augmented in queens from colonies with increased queen:worker ratios relative to colonies with a low queen:worker ratio. As assumed by models of reproductive skew, L. acervorum colonies thus appear to be capable of flexibly adjusting reproductive skew to social conditions, yet in the opposite way than predicted by most models.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background: Mating systems that reduce dispersal and lead to non-random mating might increase the potential for genetic structure to arise at fine geographic scales. Greater sage-grouse (Centrocercus urophasianus) have a lek-based mating system and exhibit high site fidelity and skewed mating ratios. We quantified population structure by analyzing variation at 27,866 single-nucleotide polymorphisms in 140 males from ten leks (within five lek complexes) occurring in a small geographic region in central Nevada. Results: Lek complexes, and to a lesser extent individual leks, formed statistically identifiable clusters in ordination analyses, providing evidence for fine-scale geographic genetic differentiation. Lek geography predicted genetic differentiation even at a small geographic scale, which could be sharpened by strong site fidelity. Relatedness was also higher among individuals within lek complexes (and leks), suggesting that reproductive skew, where few males participate in most of the successful matings, could also potentially contribute to genetic differentiation. Models incorporating a habitat resistance surface as a proxy for potentially reduced movement due to landscape features indicated that both geographic distance and habitat suitability (i.e. preferred habitat) predicted genetic structure, with no significant effect of man-made barriers to movement (i.e. power lines and roads). Finally, we illustrate how data sets containing fewer loci (<4000) had less statistical precision and failed to detect the full degree of genetic structure. Conclusion: Our results suggest that habitat features and lek site geography of sage-grouse shape fine scale genetic structure, and highlight how larger data sets can have increased precision and accuracy for quantifying ecologically relevant genetic structure over small geographic scales.
Observed phenotypic responses to selection in the wild often differ from predictions based on measurements of selection and genetic variance. An overlooked hypothesis to explain this paradox of stasis is that a skewed phenotypic distribution affects natural selection and evolution. We show through mathematical modelling that, when a trait selected for an optimum phenotype has a skewed distribution, directional selection is detected even at evolutionary equilibrium, where it causes no change in the mean phenotype. When environmental effects are skewed, Lande and Arnold’s (1983) directional gradient is in the direction opposite to the skew. In contrast, skewed breeding values can displace the mean phenotype from the optimum, causing directional selection in the direction of the skew. These effects can be partitioned out using alternative selection estimates based on average derivatives of individual relative fitness, or additive genetic covariances between relative fitness and trait (Robe...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In group-living species, the degree of relatedness among group members often governs the extent of reproductive sharing, cooperation and conflict within a group. Kinship among group members can be shaped by the presence and location of neighbouring groups, as these provide dispersal or mating opportunities that can dilute kinship among current group members. Here, we assessed how within-group relatedness varies with the density and position of neighbouring social groups in Neolamprologus pulcher, a colonial and group-living cichlid fish. We used restriction site-associated DNA sequencing (RADseq) methods to generate thousands of polymorphic SNPs. Relative to microsatellite data, RADseq data provided much tighter confidence intervals around our relatedness estimates. These data allowed us to document novel patterns of relatedness in relation to colony-level social structure. First, the density of neighbouring groups was negatively correlated with relatedness between subordinates and dominant females within a group, but no such patterns were observed between subordinates and dominant males. Second, subordinates at the colony edge were less related to dominant males in their group than subordinates in the colony centre, suggesting a shorter breeding tenure for dominant males at the colony edge. Finally, subordinates who were closely related to their same-sex dominant were more likely to reproduce, supporting some restraint models of reproductive skew. Collectively, these results demonstrate that within-group relatedness is influenced by the broader social context, and variation between groups in the degree of relatedness between dominants and subordinates can be explained by both patterns of reproductive sharing and the nature of the social landscape.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The within-host evolutionary dynamics of TB remain unclear, and underlying biological characteristics render standard population genetic approaches based upon the Wright-Fisher model largely inappropriate. In addition, the compact genome combined with an absence of recombination is expected to result in strong purifying selection effects. Thus, it is imperative to establish a biologically-relevant evolutionary framework incorporating these factors in order to enable an accurate study of this important human pathogen. Further, such a model is critical for inferring fundamental evolutionary parameters related to patient treatment, including mutation rates and the severity of infection bottlenecks. We here implement such a model and infer the underlying evolutionary parameters governing within-patient evolutionary dynamics. Results demonstrate that the progeny skew associated with the clonal nature of TB severely reduces genetic diversity and that the neglect of this parameter in previous studies has led to significant mis-inference of mutation rates. As such, our results suggest an underlying de novo mutation rate that is considerably faster than previously inferred, and a progeny distribution differing significantly from Wright-Fisher assumptions. This inference represents a more appropriate evolutionary null model, against which the periodic effects of positive selection, associated with drug-resistance for example, may be better assessed.
Images were acquired by the DeepSurvey Camera on board GEOMAR's AUV Abyss. Nodules were delineated by the CoMoNoD algorithm [see related to references]. Result files are computed per AUV dive. Nodule detections below 5cm^2 are neglected as are detections above 707cm^2. Abundance statistics are computed per m^2 and gridded per m^2 as well. For overlapping images, max-pooling has been applied to select the values reported in the result files. Pixel values in the rendered maps correspond to the units reported in the ASCI files (median-nodule-size: cm^2, nodule-number: m^-2, percent-coverage: %, sorting, skewness and pixel-contributions are unit-free).
The U.S. Geological Survey (USGS) recently completed a report documenting methods for peak-flow frequency analysis following implementation of the Bulletin 17C guidelines (https://doi.org/10.3133/tm4B5). The methods provide estimates of peak-flow quantiles for 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities (AEPs) for selected streamgages operated by the USGS and Environment Canada. In association with the report, this data release presents peak-flow frequency analyses for 148 streamgages (127 stations in Maine, 16 in New Hampshire, and 5 in New Brunswick, Canada). Included are 148 individual ".PRT" text files that contain results of the flood-frequency analyses of annual peak flows from all of the selected streamgages. The files were generated using version 7.3 of USGS software PeakFQ (https://water.usgs.gov/software/PeakFQ/; Veilleux and others, 2014) to conduct flood-frequency analyses using the expected moments algorithm (England and others, 2018). The peak-flow files used as input to PeakFQ were obtained from the USGS National Water Information System (NWIS) database (https://nwis.waterdata.usgs.gov/usa/nwis/peak) and contained annual peak flows ending in water year 2019. Results of the flood-frequency analyses at streamgages that did not have storage or regulation in the watershed (124 of the total 148) were used to develop peak-flow regression equations for estimating the selected AEPs at ungaged sites in Maine. Results from the unregulated streamgages that had periods of record of at least 20 years (51 of the 124) were used for a Maine skew analysis also outlined in the report. This data release also includes eight Excel tables summarizing the results of the peak-flow frequency analyses and peak-flow regionalization. Tables include basin characteristics used in the regionalization, information needed for flood-frequency analyses including periods of record used in analyses and skew values, maximum instantaneous floods , flood frequency estimates , information needed for advanced accuracy analyses for the streamgages and information needed for calculation of the 90-percent confidence intervals of the peak-flow equations for the AEPs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets w/ simulated error derived from Datasets 1-5. There are 1000 datasets per combination of error level and skewness type.