Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Example data for normally distributed and skewed datasets.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by peggy
Released under MIT
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A common descriptive statistic in cluster analysis is the $R^2$ that measures the overall proportion of variance explained by the cluster means. This note highlights properties of the $R^2$ for clustering. In particular, we show that generally the $R^2$ can be artificially inflated by linearly transforming the data by ``stretching'' and by projecting. Also, the $R^2$ for clustering will often be a poor measure of clustering quality in high-dimensional settings. We also investigate the $R^2$ for clustering for misspecified models. Several simulation illustrations are provided highlighting weaknesses in the clustering $R^2$, especially in high-dimensional settings. A functional data example is given showing how that $R^2$ for clustering can vary dramatically depending on how the curves are estimated.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Bernd Meeuwßen
Released under CC0: Public Domain
Facebook
TwitterWe propose a multivariate normality test against skew normal distributions using higher-order log-likelihood derivatives, which is asymptotically equivalent to the likelihood ratio but only requires estimation under the null. Numerically, it is the supremum of the univariate skewness coefficient test over all linear combinations of the variables. We can simulate its exact finite sample distribution for any multivariate dimension and sample size. Our Monte Carlo exercises confirm its power advantages over alternative approaches. Finally, we apply it to the joint distribution of US city sizes in two consecutive censuses finding that non-normality is very clearly seen in their growth rates.
Facebook
Twitterhttps://www.ycharts.com/termshttps://www.ycharts.com/terms
View market daily updates and historical trends for SKEW. from United States. Source: Chicago Board Options Exchange. Track economic data with YCharts ana…
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reproducibility package for the article:Reaction times and other skewed distributions: problems with the mean and the medianGuillaume A. Rousselet & Rand R. Wilcoxpreprint: https://psyarxiv.com/3y54rdoi: 10.31234/osf.io/3y54rThis package contains all the code and data to reproduce the figures and analyses in the article.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Facebook
TwitterTo improve flood-frequency estimates at rural streams in Mississippi, annual exceedance probability (AEP) flows at gaged streams in Mississippi and regional-regression equations, used to estimate annual exceedance probability flows for ungaged streams in Mississippi, were developed by using current geospatial data, additional statistical methods, and annual peak-flow data through the 2013 water year. The regional-regression equations were derived from statistical analyses of peak-flow data, basin characteristics associated with 281 streamgages, the generalized skew from Bulletin 17B (Interagency Advisory Committee on Water Data, 1982), and a newly developed study-specific skew for select four-digit hydrologic unit code (HUC4) watersheds in Mississippi. Four flood regions were identified based on residuals from the regional-regression analyses. No analysis was conducted for streams in the Mississippi Alluvial Plain flood region because of a lack of long-term streamflow data and poorly defined basin characteristics. Flood regions containing sites with similar basin and climatic characteristics yielded better regional-regression equations with lower error percentages. The generalized least squares method was used to develop the final regression models for each flood region for annual exceedance probability flows. The peak-flow statistics were estimated by fitting a log-Pearson type III distribution to records of annual peak flows and then applying two additional statistical methods: (1) the expected moments algorithm to help describe uncertainty in annual peak flows and to better represent missing and historical record; and (2) the generalized multiple Grubbs-Beck test to screen out potentially influential low outliers and to better fit the upper end of the peak-flow distribution. Standard errors of prediction of the generalized least-squares models ranged from 28 to 46 percent. Pseudo coefficients of determination of the models ranged from 91 to 96 percent. Flood Region A, located in north-central Mississippi, contained 27 streamgages with drainage areas that ranged from 1.41 to 612 square miles. The 1% annual exceedance probability had a standard error of prediction of 31 percent which was lower than the prediction errors in Flood Regions B and C.
Facebook
TwitterIn the fixed-effects stochastic frontier model an efficiency measure relative to the best firm in the sample is universally employed. This paper considers a new measure relative to the worst firm in the sample. We find that estimates of this measure have smaller bias than those of the traditional measure when the sample consists of many firms near the efficient frontier. Moreover, a two-sided measure relative to both the best and the worst firms is proposed. Simulations suggest that the new measures may be preferred depending on the skewness of the inefficiency distribution and the scale of efficiency differences.
Facebook
TwitterThis dataset contains upper air Skew-T Log-P charts taken at Denver, Colorado during the ICE-L project. The imagery are in GIF format. The imagery cover the time span from 2007-10-24 12:00:00 to 2008-01-03 12:00:00.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Rates of type I errors in UKS test for 3 representative experimental designs (lines) and the 4 skewed distributions shown in Figure 3B (columns). In each design, the UKS test was applied before and after log-transforming the random datasets. The rates of each design are equal to the percentages of 60000 random datasets with null factor effect that were found significant at the 0.05 threshold by the UKS test. The type I error rates obtained for the same data with Kruskal-Wallis test substituted to Anova are also indicated for the third design. Overall, either log-transformation of skewed data or use of a per-individual nonparametric test guards the UKS test against excessive type I errors.
Facebook
TwitterQuantile regression provides a powerful tool to study the effects of covariates on key quantiles of conditional distribution. Yet we often still lack a general picture about how covariates affect the overall shape of conditional distribution. Using quantile regression estimation and quantile-based measures of spread, skewness and kurtosis, we propose spread regression, skewness regression and kurtosis regression as empirical tools to quantify the effects of covariates on the spread, skewness and kurtosis of conditional distribution. This methodology is then applied to the U.S. wage data during 1980-2019 with substantive findings, and a comparison is made with a moment-based robust approach. In addition, we decompose changes in the spread into composition effects and structural effects as an effort to understand rising inequality. We also provide Stata commands spreadreg, skewreg and kurtosisreg available from SSC for easy implementation of spread, skewness and kurtosis regressions.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectivePeople often have their decisions influenced by rare outcomes, such as buying a lottery and believing they will win, or not buying a product because of a few negative reviews. Previous research has pointed out that this tendency is due to cognitive issues such as flaws in probability weighting. In this study we examine an alternative hypothesis: that people’s search behavior is biased by rare outcomes, and they can adjust the estimation of option value to be closer to the true mean, reflecting cognitive processes to adjust for sampling bias.MethodsWe recruited 180 participants through Prolific to take part in an online shopping task. On each trial, participants saw a histogram with five bins, representing the percentage of one- to five-star ratings of previous customers on a product. They could click on each bin of the histogram to examine an individual review that gave that product the corresponding star; the review was represented using a number from 0–100 called the positivity score. The goal of the participants was to sample the bins so that they could get the closest estimate of the average positivity score as possible, and they were incentivized based on accuracy of estimation. We varied the shape of the histograms within subject and the number of samples they had between subjects to examine how rare outcomes in skewed distributions influenced sampling behavior and whether having more samples would help people adjust their estimation to be closer to the true mean.ResultsBinomial tests confirmed sampling biases toward rare outcomes. Compared with 1% expected under unbiased sampling, participants allocated 11% and 12% of samples to the rarest outcome bin in the negatively and positively skewed conditions, respectively (ps < 0.001). A Bayesian linear mixed-effects analysis examined the effect of skewness and samples on estimation adjustment, defined as the difference between experienced /observed means and participants’ estimates. In the negatively skewed distribution, estimates were on average 7% closer to the true mean compared with the observed means (10-sample ∆ = −0.07, 95% CI [−0.08, −0.06]; 20-sample ∆ = −0.07, 95% CI [−0.08, −0.06]). In the positively skewed condition, estimates also moved closer to the true mean (10-sample ∆ = 0.02, 95% CI [0.01, 0.04]; 20-sample ∆ = 0.03, 95% CI [0.02, 0.04]). Still, participants’ estimates deviated from the true mean by about 9.3% on average, underscoring the persistent influence of sampling bias.ConclusionThese findings demonstrate how search biases systematically affect distributional judgments and how cognitive processes interact with biased sampling. The results have implications for human–algorithm interactions in areas such as e-commerce, social media, and politically sensitive decision-making contexts.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Quantitative-genetic models of differentiation under migration-selection balance often rely on the assumption of normally distributed genotypic and phenotypic values. When a population is subdivided into demes with selection toward different local optima, migration between demes may result in asymmetric, or skewed, local distributions. Using a simplified two-habitat model, we derive formulas without a priori assuming a Gaussian distribution of genotypic values, and we find expressions that naturally incorporate higher moments, such as skew. These formulas yield predictions of the expected divergence under migration-selection balance that are more accurate than models assuming Gaussian distributions, which illustrates the importance of incorporating these higher moments to assess the response to selection in heterogeneous environments. We further show with simulations that traits with loci of large effect display the largest skew in their distribution at migration-selection balance.
Facebook
TwitterThis dataset contains upper air Skew-T Log-P charts taken at Boise, Idaho during the ICE-L project. The imagery are in GIF format. The imagery cover the time span from 2007-11-08 12:00:00 to 2008-01-03 12:00:00.
Facebook
TwitterMedian and trimmed-mean inflation rates tend to be useful estimates of trend inflation over long periods, but they can exhibit persistent departures from the underlying trend over shorter horizons. In this Commentary, we document that the extent of this bias is related to the degree of skewness in the distribution of price changes. The shift in the skewness of the cross-sectional price-change distribution during the pandemic means that median PCE and trimmed-mean PCE inflation rates have recently been understating the trend in PCE inflation by about 15 and 35 basis points, respectively.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Negatively skewed data arise occasionally in statistical practice; perhaps the most familiar example is the distribution of human longevity. Although other generalizations of the normal distribution exist, we demonstrate a new alternative that apparently fits human longevity data better. We propose an alternative approach of a normal distribution whose scale parameter is conditioned on attained age. This approach is consistent with previous findings that longevity conditioned on survival to the modal age behaves like a normal distribution. We derive such a distribution and demonstrate its accuracy in modeling human longevity data from life tables. The new distribution is characterized by 1. An intuitively straightforward genesis; 2. Closed forms for the pdf, cdf, mode, quantile, and hazard functions; and 3. Accessibility to non-statisticians, based on its close relationship to the normal distribution.
Facebook
TwitterTraffic analytics, rankings, and competitive metrics for skew.com as of September 2025
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Example data for normally distributed and skewed datasets.