55 datasets found
  1. Reaction times and other skewed distributions: problems with the mean and...

    • figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillaume Rousselet; Rand Wilcox (2023). Reaction times and other skewed distributions: problems with the mean and the median [Dataset]. http://doi.org/10.6084/m9.figshare.6911924.v4
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Guillaume Rousselet; Rand Wilcox
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Reproducibility package for the article:Reaction times and other skewed distributions: problems with the mean and the medianGuillaume A. Rousselet & Rand R. Wilcoxpreprint: https://psyarxiv.com/3y54rdoi: 10.31234/osf.io/3y54rThis package contains all the code and data to reproduce the figures and analyses in the article.

  2. Dataset for: Some Remarks on the R2 for Clustering

    • wiley.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicola Loperfido; Thaddeus Tarpey (2023). Dataset for: Some Remarks on the R2 for Clustering [Dataset]. http://doi.org/10.6084/m9.figshare.6124508.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Wileyhttps://www.wiley.com/
    Authors
    Nicola Loperfido; Thaddeus Tarpey
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A common descriptive statistic in cluster analysis is the $R^2$ that measures the overall proportion of variance explained by the cluster means. This note highlights properties of the $R^2$ for clustering. In particular, we show that generally the $R^2$ can be artificially inflated by linearly transforming the data by ``stretching'' and by projecting. Also, the $R^2$ for clustering will often be a poor measure of clustering quality in high-dimensional settings. We also investigate the $R^2$ for clustering for misspecified models. Several simulation illustrations are provided highlighting weaknesses in the clustering $R^2$, especially in high-dimensional settings. A functional data example is given showing how that $R^2$ for clustering can vary dramatically depending on how the curves are estimated.

  3. n

    Data from: Body temperature distributions of active diurnal lizards in three...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Aug 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raymond B. Huey; Eric R. Pianka (2018). Body temperature distributions of active diurnal lizards in three deserts: skewed up or skewed down? [Dataset]. http://doi.org/10.5061/dryad.45g3s
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 4, 2018
    Dataset provided by
    University of Washington
    The University of Texas at Austin
    Authors
    Raymond B. Huey; Eric R. Pianka
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    North America, Africa, Australia
    Description
    1. The performance of ectotherms integrated over time depends in part on the position and shape of the distribution of body temperatures (Tb) experienced during activity. For several complementary reasons, physiological ecologists have long expected that Tb distributions during activity should have a long left tail (left-skewed); but only infrequently have they quantified the magnitude and direction of Tb skewness in nature.
    2. To evaluate whether left-skewed Tb distributions are general for diurnal desert lizards, we compiled and analyzed Tb (∑ = 9,023 temperatures) from our own prior studies of active desert lizards on three continents (25 species in Western Australia, 10 in the Kalahari Desert of Africa, and 10 species in western North America). We gathered these data over several decades, using standardized techniques.
    3. Many species showed significantly left-skewed Tb distributions, even when records were restricted to summer months. However, magnitudes of skewness were always small, such that mean Tb were never more than 1°C lower than median Tb. The significance of Tb skewness was sensitive to sample size, and power tests reinforced this sensitivity.
    4. The magnitude of skewness was not obviously related to phylogeny, desert, body size, or median body temperature. Moreover, formal phylogenetic analysis is inappropriate because geography and phylogeny are confounded (that is, are highly collinear).
    5. Skewness might be limited if lizards pre-warm inside retreats before emerging in the morning, emerge only when operative temperatures are high enough to speed warming to activity Tb, or if cold lizards are especially wary and difficult to spot or catch. Telemetry studies may help evaluate these possibilities.
  4. Data from: Adjusting Median and Trimmed-Mean Inflation Rates for Bias Based...

    • clevelandfed.org
    Updated Mar 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Reserve Bank of Cleveland (2022). Adjusting Median and Trimmed-Mean Inflation Rates for Bias Based on Skewness [Dataset]. https://www.clevelandfed.org/publications/economic-commentary/2022/ec-202205-adjusting-median-and-trimmed-mean-inflation-rates-for-bias-based-on-skewness
    Explore at:
    Dataset updated
    Mar 24, 2022
    Dataset authored and provided by
    Federal Reserve Bank of Clevelandhttps://www.clevelandfed.org/
    Description

    Median and trimmed-mean inflation rates tend to be useful estimates of trend inflation over long periods, but they can exhibit persistent departures from the underlying trend over shorter horizons. In this Commentary, we document that the extent of this bias is related to the degree of skewness in the distribution of price changes. The shift in the skewness of the cross-sectional price-change distribution during the pandemic means that median PCE and trimmed-mean PCE inflation rates have recently been understating the trend in PCE inflation by about 15 and 35 basis points, respectively.

  5. f

    Data Sheet 1_The impact of distribution properties on sampling behavior.docx...

    • figshare.com
    docx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thai Quoc Cao; Benjamin Scheibehenne (2025). Data Sheet 1_The impact of distribution properties on sampling behavior.docx [Dataset]. http://doi.org/10.3389/fpsyg.2025.1597227.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset provided by
    Frontiers
    Authors
    Thai Quoc Cao; Benjamin Scheibehenne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ObjectivePeople often have their decisions influenced by rare outcomes, such as buying a lottery and believing they will win, or not buying a product because of a few negative reviews. Previous research has pointed out that this tendency is due to cognitive issues such as flaws in probability weighting. In this study we examine an alternative hypothesis: that people’s search behavior is biased by rare outcomes, and they can adjust the estimation of option value to be closer to the true mean, reflecting cognitive processes to adjust for sampling bias.MethodsWe recruited 180 participants through Prolific to take part in an online shopping task. On each trial, participants saw a histogram with five bins, representing the percentage of one- to five-star ratings of previous customers on a product. They could click on each bin of the histogram to examine an individual review that gave that product the corresponding star; the review was represented using a number from 0–100 called the positivity score. The goal of the participants was to sample the bins so that they could get the closest estimate of the average positivity score as possible, and they were incentivized based on accuracy of estimation. We varied the shape of the histograms within subject and the number of samples they had between subjects to examine how rare outcomes in skewed distributions influenced sampling behavior and whether having more samples would help people adjust their estimation to be closer to the true mean.ResultsBinomial tests confirmed sampling biases toward rare outcomes. Compared with 1% expected under unbiased sampling, participants allocated 11% and 12% of samples to the rarest outcome bin in the negatively and positively skewed conditions, respectively (ps < 0.001). A Bayesian linear mixed-effects analysis examined the effect of skewness and samples on estimation adjustment, defined as the difference between experienced /observed means and participants’ estimates. In the negatively skewed distribution, estimates were on average 7% closer to the true mean compared with the observed means (10-sample ∆ = −0.07, 95% CI [−0.08, −0.06]; 20-sample ∆ = −0.07, 95% CI [−0.08, −0.06]). In the positively skewed condition, estimates also moved closer to the true mean (10-sample ∆ = 0.02, 95% CI [0.01, 0.04]; 20-sample ∆ = 0.03, 95% CI [0.02, 0.04]). Still, participants’ estimates deviated from the true mean by about 9.3% on average, underscoring the persistent influence of sampling bias.ConclusionThese findings demonstrate how search biases systematically affect distributional judgments and how cognitive processes interact with biased sampling. The results have implications for human–algorithm interactions in areas such as e-commerce, social media, and politically sensitive decision-making contexts.

  6. n

    Data from: Selection on skewed characters and the paradox of stasis

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Sep 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suzanne Bonamour; Céline Teplitsky; Anne Charmantier; Pierre-André Crochet; Luis-Miguel Chevin (2017). Selection on skewed characters and the paradox of stasis [Dataset]. http://doi.org/10.5061/dryad.pt07g
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 8, 2017
    Dataset provided by
    Centre National de la Recherche Scientifique
    Authors
    Suzanne Bonamour; Céline Teplitsky; Anne Charmantier; Pierre-André Crochet; Luis-Miguel Chevin
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Observed phenotypic responses to selection in the wild often differ from predictions based on measurements of selection and genetic variance. An overlooked hypothesis to explain this paradox of stasis is that a skewed phenotypic distribution affects natural selection and evolution. We show through mathematical modelling that, when a trait selected for an optimum phenotype has a skewed distribution, directional selection is detected even at evolutionary equilibrium, where it causes no change in the mean phenotype. When environmental effects are skewed, Lande and Arnold’s (1983) directional gradient is in the direction opposite to the skew. In contrast, skewed breeding values can displace the mean phenotype from the optimum, causing directional selection in the direction of the skew. These effects can be partitioned out using alternative selection estimates based on average derivatives of individual relative fitness, or additive genetic covariances between relative fitness and trait (Robertson-Price identity). We assess the validity of these predictions using simulations of selection estimation under moderate samples size. Ecologically relevant traits may commonly have skewed distributions, as we here exemplify with avian laying date – repeatedly described as more evolutionarily stable than expected –, so this skewness should be accounted for when investigating evolutionary dynamics in the wild.

  7. Skewness project raw data files and codes

    • figshare.com
    xlsx
    Updated Mar 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raunak Dey; Sreekanth K Manikandan (2022). Skewness project raw data files and codes [Dataset]. http://doi.org/10.6084/m9.figshare.17703269.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 14, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Raunak Dey; Sreekanth K Manikandan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains raw data files and base codes to analyze them.A. The 'powerx_y.xlsx' files are the data files with the one dimensional trajectory of optically trapped probes modulated by an Ornstein-Uhlenbeck noise of given 'x' amplitude. For the corresponding diffusion amplitude A=0.1X(0.6X10-6)2 m2/s, x is labelled as '1'B. The codes are of three types. The skewness codes are used to calculate the skewness of the trajectory. The error_in_fit codes are used to calculate deviations from arcsine behavior. The sigma_exp codes point to the deviation of the mean from 0.5. All the codes are written three times to look ar T+, Tlast and Tmax.C. More information can be found in the manuscript.

  8. U

    Annual peak-flow data and results of flood-frequency analysis for 76...

    • data.usgs.gov
    • s.cnmilf.com
    • +1more
    Updated Sep 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Wagner; Jon Voss; Roger Claybrooke; David Heimann (2024). Annual peak-flow data and results of flood-frequency analysis for 76 selected streamflow gaging stations operated by the U.S. Geological Survey in the upper White River basin, Missouri and Arkansas, computed using an updated generalized (regional) flood skew [Dataset]. http://doi.org/10.5066/P9C3L7IN
    Explore at:
    Dataset updated
    Sep 3, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Daniel Wagner; Jon Voss; Roger Claybrooke; David Heimann
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    1904 - 2020
    Area covered
    Arkansas, Missouri
    Description

    This dataset contains site information, basin characteristics, results of flood-frequency analysis, and a generalized (regional) flood skew for 76 selected streamgages operated by the U.S. Geological Survey (USGS) in the upper White River basin (4-digit hydrologic unit 1101) in southern Missouri and northern Arkansas. The Little Rock District U.S. Army Corps of Engineers (USACE) needed updated estimates of streamflows corresponding to selected annual exceedance probabilities (AEPs) and a basin-specific regional flood skew. USGS selected 111 candidate streamgages in the study area that had 20 or more years of gaged annual peak-flow data available through the 2020 water year. After screening for regulation, urbanization, redundant/nested basins, drainage areas greater than 2,500 square miles, and streamgage basins located in the Mississippi Alluvial Plain (8-digit hydrologic unit 11010013), 77 candidate streamgages remained. After conducting the initial flood-frequency analysis ...

  9. Electronics Project(2600+ projects)

    • kaggle.com
    zip
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NICK-2908 (2025). Electronics Project(2600+ projects) [Dataset]. https://www.kaggle.com/datasets/nick2908/electronics-project2600-projects
    Explore at:
    zip(274002 bytes)Available download formats
    Dataset updated
    Nov 13, 2025
    Authors
    NICK-2908
    Description

    **Summary ** This dataset contains over 2,600 circuit projects scraped from Instructables, focusing on the "Circuits" category. It includes project titles, authors, engagement metrics (views, likes), and the primary component used (Instruments).

    ** How This Data Was Collected**

    I built a web scraper using Python and Selenium to gather all project links (over 2,600 of them) by handling the "Load All" button. The full page source was saved, and I then used BeautifulSoup to parse the HTML and extract the raw data for each project.

    Data Cleaning (The Important Part!)

    The raw data was very messy. I performed a full data cleaning pipeline in a Colab notebook using Pandas.

    • Converted Text to Numbers: Views and Likes were text fields (object).
    • Handled "K" Values: Found and converted "K" values (e.g., "2.2K") into proper numbers (2200).
    • Handled Missing Data: Replaced all "N/A" strings with null values.
    • Mean Imputation: To keep the dataset complete, I filled all missing Likes and Views with the mean (average) of the respective column.

    Key Insights & Analysis

    1. "Viral" Effect (High Skew): The Views and Likes data is highly right-skewed (skewness of ~9.5). This shows a "viral" effect where a tiny number of superstar projects get the vast majority of all views and likes.

    [](url)

    1. Log-Transformation: Because of the skew, I created log_Views and log_Likes columns. A 2D density plot of these log-transformed columns shows a strong positive correlation (as likes increase, views increase) and that the most "typical" project gets around 30-40 likes and 4,000-5,000 views. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F29431778%2Fd90e2039f1be11b53308ab7191b10954%2Fdownload%20(1).png?generation=1763013545903998&alt=media" alt="">

    2. Top Instruments: I've also analyzed the most popular instruments to see which ones get the most engagement. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F29431778%2F19fca1ce142ddddc1e16a5319a1f4fc5%2Fdownload%20(2).png?generation=1763013562400830&alt=media" alt="">

    Column Descriptions

    • Title: The name of the project.
    • Project_Admin: The author/creator of the project.
    • Image_URL: The URL for the project's cover image.
    • Views: The total number of views (cleaned and imputed).
    • Likes: The total number of likes/favorites (cleaned and imputed).
    • Instruments: The main component or category tag (e.g., "Arduino", "Raspberry Pi").
  10. n

    Data from: Improving structured population models with more realistic...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Jun 14, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Megan L. Peterson; William Morris; Cristina Linares; Daniel Doak (2019). Improving structured population models with more realistic representations of non-normal growth [Dataset]. http://doi.org/10.5061/dryad.t6c3573
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 14, 2019
    Dataset provided by
    Universitat de Barcelona
    University of Colorado Boulder
    Duke University
    Authors
    Megan L. Peterson; William Morris; Cristina Linares; Daniel Doak
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    NW Mediterranean Sea, Niwot Ridge, USA, Alaska, Kennicott Valley, Colorado
    Description
    1. Structured population models are among the most widely used tools in ecology and evolution. Integral projection models (IPMs) use continuous representations of how survival, reproduction, and growth change as functions of state variables such as size, requiring fewer parameters to be estimated than projection matrix models (PPMs). Yet almost all published IPMs make an important assumption: that size-dependent growth transitions are or can be transformed to be normally distributed. In fact, many organisms exhibit highly skewed size transitions. Small individuals can grow more than they can shrink, and large individuals may often shrink more dramatically than they can grow. Yet the implications of such skew for inference from IPMs has not been explored, nor have general methods been developed to incorporate skewed size transitions into IPMs, or deal with other aspects of real growth rates, including bounds on possible growth or shrinkage. 2. Here we develop a flexible approach to modeling skewed growth data using a modified beta regression model. We propose that sizes first be converted to a (0,1) interval by estimating size-dependent minimum and maximum sizes through quantile regression. Transformed data can then be modeled using beta regression with widely available statistical tools. We demonstrate the utility of this approach using demographic data for a long-lived plant, gorgonians, and an epiphytic lichen. Specifically, we compare inferences of population parameters from discrete PPMs to those from IPMs that either assume normality or incorporate skew using beta regression or, alternatively, a skewed normal model. 3. The beta and skewed normal distributions accurately capture the mean, variance, and skew of real growth distributions. Incorporating skewed growth into IPMs decreases population growth and estimated lifespan relative to IPMs that assume normally-distributed growth, and more closely approximate the parameters of PPMs that do not assume a particular growth distribution. A bounded distribution, such as the beta, also avoids the eviction problem caused by predicting some growth outside the modeled size range. 4. Incorporating biologically relevant skew in growth data has important consequences for inference from IPMs. The approaches we outline here are flexible and easy to implement with existing statistical tools.
  11. m

    Impact of limited data availability on the accuracy of project duration...

    • data.mendeley.com
    Updated Nov 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naimeh Sadeghi (2022). Impact of limited data availability on the accuracy of project duration estimation in project networks [Dataset]. http://doi.org/10.17632/bjfdw6xbxw.3
    Explore at:
    Dataset updated
    Nov 22, 2022
    Authors
    Naimeh Sadeghi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This database includes simulated data showing the accuracy of estimated probability distributions of project durations when limited data are available for the project activities. The base project networks are taken from PSPLIB. Then, various stochastic project networks are synthesized by changing the variability and skewness of project activity durations. Number of variables: 20 Number of cases/rows: 114240 Variable List: • Experiment ID: The ID of the experiment • Experiment for network: The ID of the experiment for each of the synthesized networks • Network ID: ID of the synthesized network • #Activities: Number of activities in the network, including start and finish activities • Variability: Variance of the activities in the network (this value can be either high, low, medium or rand, where rand shows a random combination of low, high and medium variance in the network activities.) • Skewness: Skewness of the activities in the network (Skewness can be either right, left, None or rand, where rand shows a random combination of right, left, and none skewed in the network activities)
    • Fitted distribution type: Distribution type used to fit on sampled data • Sample size: Number of sampled data used for the experiment resembling limited data condition • Benchmark 10th percentile: 10th percentile of project duration in the benchmark stochastic project network • Benchmark 50th percentile: 50th project duration in the benchmark stochastic project network • Benchmark 90th percentile: 90th project duration in the benchmark stochastic project network • Benchmark mean: Mean project duration in the benchmark stochastic project network • Benchmark variance: Variance project duration in the benchmark stochastic project network • Experiment 10th percentile: 10th percentile of project duration distribution for the experiment • Experiment 50th percentile: 50th percentile of project duration distribution for the experiment • Experiment 90th percentile: 90th percentile of project duration distribution for the experiment • Experiment mean: Mean of project duration distribution for the experiment • Experiment variance: Variance of project duration distribution for the experiment • K-S: Kolmogorov–Smirnov test comparing benchmark distribution and project duration • distribution of the experiment • P_value: the P-value based on the distance calculated in the K-S test

  12. 4

    Supplementary data for the paper "Why psychologists should not default to...

    • data.4tu.nl
    zip
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joost de Winter (2025). Supplementary data for the paper "Why psychologists should not default to Welch’s t-test instead of Student’s t-test (and why the Anderson–Darling test is an underused alternative)" [Dataset]. http://doi.org/10.4121/e8e6861a-7ab0-4b6d-bd67-5f95029322c5.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    4TU.ResearchData
    Authors
    Joost de Winter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper evaluates the claim that Welch’s t-test (WT) should replace the independent-samples t-test (IT) as the default approach for comparing sample means. Simulations involving unequal and equal variances, skewed distributions, and different sample sizes were performed. For normal distributions, we confirm that the WT maintains the false positive rate close to the nominal level of 0.05 when sample sizes and standard deviations are unequal. However, the WT was found to yield inflated false positive rates under skewed distributions, even with relatively large sample sizes, whereas the IT avoids such inflation. A complementary empirical study based on gender differences in two psychological scales corroborates these findings. Finally, we contend that the null hypothesis of unequal variances together with equal means lacks plausibility, and that empirically, a difference in means typically coincides with differences in variance and skewness. An additional analysis using the Kolmogorov-Smirnov and Anderson-Darling tests demonstrates that examining entire distributions, rather than just their means, can provide a more suitable alternative when facing unequal variances or skewed distributions. Given these results, researchers should remain cautious with software defaults, such as R favoring Welch’s test.

  13. h

    Skewness and kurtosis of mean transverse momentum fluctuations at the LHC...

    • hepdata.net
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HEPData (2024). Skewness and kurtosis of mean transverse momentum fluctuations at the LHC energies [Dataset]. http://doi.org/10.17182/hepdata.147284.v1
    Explore at:
    Dataset updated
    2024
    Dataset provided by
    HEPData
    Description

    Measurements of higher-order fluctuations of $\langle p_\mathrm{T}\rangle$ distribution in Pb$-$Pb collisions, Xe$-$Xe collisions at $\sqrt{s_\mathrm{NN}}$ = 5.02 TeV and in pp collisions at at $\sqrt{s}$ = 5.02 TeV using the ALICE detector at the CERN LHC. Standardized skewness, intensive skewness and kurtosis of $\langle p_\mathrm{T}\rangle$ are presented as a function of $\langle\mathrm{d}N_\mathrm{ch}/\mathrm{d}\eta\rangle^{1/3}_{|\eta|<0.5}$ for all the collision systems. The charged particles are selected in the transverse momentum range, 0.2 < $p_\mathrm{T}$ (GeV/c) < 3.0 and pseudorapidity range, $|\eta|$ < 0.8.

  14. n

    Data from: SkewDB: A comprehensive database of GC and 10 other skews for...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Oct 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bert Hubert (2021). SkewDB: A comprehensive database of GC and 10 other skews for over 28,000 chromosomes and plasmids [Dataset]. http://doi.org/10.5061/dryad.g4f4qrfr6
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 4, 2021
    Dataset provided by
    Independent researcher
    Authors
    Bert Hubert
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    GC skew denotes the relative excess of G nucleotides over C nucleotides on the leading versus the lagging replication strand of eubacteria. While the effect is small, typically around 2.5%, it is robust and pervasive. GC skew and the analogous TA skew are a localized deviation from Chargaff’s second parity rule, which states that G and C, and T and A occur with (mostly) equal frequency even within a strand.

    Most bacteria also show the analogous TA skew. Different phyla show different kinds of skew and differing relations between TA and GC skew. This article introduces an open access database (https://skewdb.org) of GC and 10 other skews for over 28,000 chromosomes and plasmids. Further details like codon bias, strand bias, strand lengths and taxonomic data are also included.

    The SkewDB database can be used to generate or verify hypotheses. Since the origins of both the second parity rule, as well as GC skew itself, are not yet satisfactorily explained, such a database may enhance our understanding of microbial DNA.

    Methods The SkewDB analysis relies exclusively on the tens of thousands of FASTA and GFF3 files available through the NCBI download service, which covers both GenBank and RefSeq. The database includes bacteria, archaea and their plasmids. Furthermore, to ease analysis, the NCBI Taxonomy database is sourced and merged so output data can quickly be related to (super)phyla or specific species. No other data is used, which greatly simplifies processing. Data is read directly in the compressed format provided by NCBI.

    All results are emitted as standard CSV files. In the first step of the analysis, for each organism the FASTA sequence and the GFF3 annotation file are parsed. Every chromosome in the FASTA file is traversed from beginning to end, while a running total is kept for cumulative GC and TA skew. In addition, within protein coding genes, such totals are also kept separately for these skews on the first, second and third codon position. Furthermore, separate totals are kept for regions which do not code for proteins. In addition, to enable strand bias measurements, a cumulative count is maintained of nucleotides that are part of a positive or negative sense gene. The counter is increased for positive sense nucleotides, decreased for negative sense nucleotides, and left alone for non-genic regions.

    A separate counter is kept for non-genic nucleotides. Finally, G and C nucleotides are counted, regardless of if they are part of a gene or not. These running totals are emitted at 4096 nucleotide intervals, a resolution suitable for determining skews and shifts. In addition, one line summaries are stored for each chromosome. These line includes the RefSeq identifier of the chromosome, the full name mentioned in the FASTA file, plus counts of A, C, G and T nucleotides. Finally five levels of taxonomic data are stored.

    Chromosomes and plasmids of fewer than 100 thousand nucleotides are ignored, as these are too noisy to model faithfully. Plasmids are clearly marked in the database, enabling researchers to focus on chromosomes if so desired. Fitting Once the genomes have been summarised at 4096-nucleotide resolution, the skews are fitted to a simple model. The fits are based on four parameters. Alpha1 and alpha2 denote the relative excess of G over C on the leading and lagging strands. If alpha1 is 0.046, this means that for every 1000 nucleotides on the leading strand, the cumulative count of G excess increases by 46. The third parameter is div and it describes how the chromosome is divided over leading and lagging strands. If this number is 0.557, the leading replication strand is modeled to make up 55.7% of the chromosome. The final parameter is shift (the dotted vertical line), and denotes the offset of the origin of replication compared to the DNA FASTA file. This parameter has no biological meaning of itself, and is an artifact of the DNA assembly process.

    The goodness-of-fit number consists of the root mean squared error of the fit, divided by the absolute mean skew. This latter correction is made to not penalize good fits for bacteria showing significant skew. GC skew tends to be defined very strongly, and it is therefore used to pick the div and shift parameters of the DNA sequence, which are then kept as a fixed constraint for all the other skews, which might not be present as clearly. The fitting process itself is a downhill simplex method optimization over the three dimensions, seeded with the average observed skew over the whole genome, and assuming there is no shift, and that the leading and lagging strands are evenly distributed. The simplex optimization is tuned so that it takes sufficiently large steps so it can reach the optimum even if some initial assumptions are off.

  15. d

    At-site flood frequency statistics for unregulated streamgages in and near...

    • catalog.data.gov
    • data.usgs.gov
    Updated Sep 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). At-site flood frequency statistics for unregulated streamgages in and near Virginia and West Virginia, 2025 [Dataset]. https://catalog.data.gov/dataset/at-site-flood-frequency-statistics-for-unregulated-streamgages-in-and-near-virginia-and-we
    Explore at:
    Dataset updated
    Sep 17, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    West Virginia
    Description

    This child item contains input files and output files for 857 rural, unregulated streamgages in and near Virginia and West Virginia processed using PeakFQ v7.5.1 for the 2025 flood frequency analysis. Additionally, results were extracted from the input and output files for in each streamgage and summaries were generated. This dataset includes: 1) "Station.zip": A subdirectory containing published annual peaks in WATSTORE format (.pkf), specifications used for processing in Program Specification File format (.psf), the output text file containing computation results (.prt), and the export text file (exp). The files within these subfolders are aggregated by 2-digit downstream order part number (01, 02, and 03) due to the limitations of maximum number of records PeakFQ v7.5.1 can process at once. At-station skew was used to compute these results. 2) "Weighted.zip": A subdirectory containing published annual peaks in WATSTORE format (.pkf), specifications used for processing in Program Specification File format (.psf), the output text file containing computation results (.prt), and the export text file (exp). The files within these subfolders are aggregated by 2-digit downstream order part number (01, 02, and 03) due to the limitations of maximum number of records PeakFQ v7.5.1 can process at once. Weighted regional skew was used to compute these results. 3) Table_3_Unregulated_Frequency_Specs.txt: Contains a summary of the specifications and statistics for at-station frequency analyses for 857 streamgages on unregulated streams in and near Virginia and West Virginia for annual peak streamflows including period of record, record length, skew specifications, thresholds and counts of PILFs, value, significance, and slope of the Mann-Kendall trend test; mean, median, and station skew for annual peak streamflows analyzed; regional skew; and MSE of station and regional skew. 4) Table_4_Unregulated_Frequency_Summary.txt: Contains a summary of the results of valid at-station frequency analyses for 813 streamgages on unregulated streams in and near Virginia and West Virginia for annual peak streamflows including flow and variance values for the 0.995, 0.667, 0.5, 0.2, 0.1, 0.04, 0.02, 0.01, 0.005, and 0.002 annual exceedance probabilities.A combination of at-station skew and weighted regional skew was used were appropriate to compute these results.

  16. U

    At-site flood frequency for 422 streamgages in parts of the Upper...

    • data.usgs.gov
    • catalog.data.gov
    Updated Mar 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Wagner; Padraic O'Shea (2025). At-site flood frequency for 422 streamgages in parts of the Upper Mississippi and Souris-Red-Rainy basins and surrounding areas in the United States, using data through water year 2013 [Dataset]. http://doi.org/10.5066/P1QWRRAQ
    Explore at:
    Dataset updated
    Mar 15, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Daniel Wagner; Padraic O'Shea
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Dec 31, 2024
    Area covered
    United States
    Description

    This dataset contains site information and results of flood-frequency analysis for 422 streamflow gaging stations (streamgages) operated by the U.S. Geological Survey (USGS) in parts of the Upper Mississippi and Souris-Red-Rainy basins and surrounding areas in the United States. Annual peak-flow data from the 1844 - 2013 water years were used in the study (U.S. Geological Survey, 2024). Following federal guidelines for flood-frequency analysis (Bulletin 17C; England and others, 2018) and methods outlined in recent flood-frequency reports for the region (Eash and others, 2013; Southard and Veilleux, 2014; Levin and Sanocki, 2023; Sanocki and Levin, 2023), the Expected Moments Algorithm (EMA) was used in version 7.5.1 of USGS PeakFQ software (Veilleux and others, 2014; Flynn and others, 2006; https://water.ugsgs.gov/software/PeakFQ/) to conduct the analyses. Results of the analyses, specifically the at-site skew and its mean squared error, are intended for use in Bayesian weighte ...

  17. n

    Data from: Using social parasitism to test reproductive skew models in a...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jun 10, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan P. Green; Michael A. Cant; Jeremy Field (2014). Using social parasitism to test reproductive skew models in a primitively eusocial wasp [Dataset]. http://doi.org/10.5061/dryad.84mf4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 10, 2014
    Dataset provided by
    University of Exeter
    University of Liverpool
    University of Sussex
    Authors
    Jonathan P. Green; Michael A. Cant; Jeremy Field
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Remarkable variation exists in the distribution of reproduction (skew) among members of cooperatively breeding groups, both within and between species. Reproductive skew theory has provided an important framework for understanding this variation. In the primitively eusocial Hymenoptera, two models have been routinely tested: concessions models, which assume complete control of reproduction by a dominant individual, and tug-of-war models, which assume on-going competition among group members over reproduction. Current data provide little support for either model, but uncertainty about the ability of individuals to detect genetic relatedness and difficulties in identifying traits conferring competitive ability mean that the relative importance of concessions versus tug-of-war remains unresolved. Here, we suggest that the use of social parasitism to generate meaningful variation in key social variables represents a valuable opportunity to explore the mechanisms underpinning reproductive skew within the social Hymenoptera. We present a direct test of concessions and tug-of-war models in the paper wasp Polistes dominulus by exploiting pronounced changes in relatedness and power structures that occur following replacement of the dominant by a congeneric social parasite. Comparisons of skew in parasitized and unparasitized colonies are consistent with a tug-of-war over reproduction within P. dominulus groups, but provide no evidence for reproductive concessions.

  18. Data from: Sediment particle size analysis for stations from the Western...

    • data-search.nerc.ac.uk
    • hosted-metadata.bgs.ac.uk
    http
    Updated Jul 25, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UK Polar Data Centre, Natural Environment Research Council, UK Research & Innovation (2020). Sediment particle size analysis for stations from the Western Barents Sea for summer 2017 and 2018 [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/api/records/GB_NERC_BAS_PDC_01373
    Explore at:
    httpAvailable download formats
    Dataset updated
    Jul 25, 2020
    Dataset provided by
    Natural Environment Research Councilhttps://www.ukri.org/councils/nerc
    Authors
    UK Polar Data Centre, Natural Environment Research Council, UK Research & Innovation
    Time period covered
    Jul 19, 2018 - Jul 28, 2018
    Area covered
    Description

    Sediment particle size frequency distributions from the USNL (Unites States Naval Laboratory) box cores were determined optically using a Malvern Mastersizer 2000 He-Ne LASER diffraction sizer and were used to resolve mean particle size, sorting, skewness and kurtosis.

    Samples were collected on cruises JR16006 and JR17007.

    Funding was provided by ''The Changing Arctic Ocean Seafloor (ChAOS) - how changing sea ice conditions impact biological communities, biogeochemical processes and ecosystems'' project (NE/N015894/1 and NE/P006426/1, 2017-2021), part of the NERC funded Changing Arctic Ocean programme.

  19. D

    Admixture and reproductive skew shape the conservation value of ex situ...

    • datasetcatalog.nlm.nih.gov
    • datadryad.org
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Horsburgh, Gavin; Shultz, Susanne; King, Tony; Dawson, Deborah; Walton, Catherine; Kretzschmar, Petra; Hruby, Jírí; Elsner-Gearing, Franziska; Pilgrim, Mark; Hopper, Jane (2024). Admixture and reproductive skew shape the conservation value of ex situ populations of the Critically Endangered eastern black rhino - microsatellite and mitochondrial genotype data [Dataset]. http://doi.org/10.5061/dryad.69p8cz97p
    Explore at:
    Dataset updated
    Aug 15, 2024
    Authors
    Horsburgh, Gavin; Shultz, Susanne; King, Tony; Dawson, Deborah; Walton, Catherine; Kretzschmar, Petra; Hruby, Jírí; Elsner-Gearing, Franziska; Pilgrim, Mark; Hopper, Jane
    Description

    Small populations of endangered species risk losing already eroded genetic diversity, important for adaptive potential, through the effects of genetic drift. The magnitude of drift can be mitigated by maximising the effective population size, as is the goal of genetic management strategies. Different mating systems, specifically those leading to reproductive skew, exacerbate genetic drift by distorting contributions. In the absence of an active management strategy, reproductive skew will have long-term effects on the genetic composition of a population, particularly where admixture is present. Here we examine the contrasting effects of conservation management strategies in two ex situ populations of the Critically Endangered eastern black rhino (Diceros bicornis michaeli), one managed as a semi-wild population in South Africa (SAx), and one managed under a mean-kinship breeding strategy in European zoos. We use molecular data to reconstruct pedigrees for both populations and validate the method using the zoo studbook. Using the reconstructed pedigree and studbook we show there is male sex-specific skew in both populations. However, the zoo’s mean-kinship breeding strategy effectively reduces reproductive skew in comparison to a semi-wild population with little genetic management. We also show that strong male reproductive skew in SAx has resulted in extensive admixture, which may require a re-evaluation of the population’s original intended role in the black rhino meta-population. With a high potential for admixture in many ex situ populations of endangered species, molecular and pedigree data remain vital tools for populations needing to balance drift and selection.

  20. Data for On The Finite-Size Lyapunov Exponent For The Schrodinger Operator...

    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Michael Kielstra; Paul Michael Kielstra; Marius Lemm; Marius Lemm (2020). Data for On The Finite-Size Lyapunov Exponent For The Schrodinger Operator With Skew-Shift Potential [Dataset]. http://doi.org/10.5281/zenodo.2638904
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Paul Michael Kielstra; Paul Michael Kielstra; Marius Lemm; Marius Lemm
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    A ZIP file of computer code and output used in the numerical calculations for On The Finite-Size Lyapunov Exponent For The Schrodinger Operator With Skew-Shift Potential by Paul Michael Kielstra and Marius Lemm. The ZIP decompresses to about 26GB, containing multiple files:

    201x201 bad set grid.txt: A list of 201x201=40401 evenly spaced points on [0, 1]x[0, 1], each written in the form (x, y) and followed by 30000 values of E which are probably bad for that point. This gives a total of 40401x30001=1212070401 lines.

    Upper bounds.txt: individual upper bounds for equation (9) calculated at various points. The bound in this equation in the published paper is the worst of these.

    E=0/N/2001x2001 grid.tsv: A tab-separated values file of 2001x2001=4004001 evenly spaced points on [0, 1]x[0, 1], with headers:
    X: The x-coordinate of the point represented by the line in question.
    Y: The y-coordinate.
    Exact_x, Exact_y: The x- and y-coordinates to the maximum precision the computer used. In case, for instance, the x-coordinate is defined to be 0.5 but is actually 0.5000000000000001 in memory.
    Matrix: The matrix generated at this point, modulo a certain normalization (see below).
    Result: The log of the norm of the matrix. This has been corrected for the normalization -- it is calculated as if the matrix had never been normalized.
    Normalizationcount: The actual matrix generated is too large to store in memory, so the matrix we store and output is (Matrix)x(Normalizer^Normalizationcount). We used a normalizer of 0.01.
    This file was calculated with the values E=0, N=30000, lambda=1/2. The header line means that this file contains 4004001+1=4004002 lines in total.

    E=0/N/2001x2001 random grid.tsv: As with the 2001x2001 grid.tsv file, but missing the exact_x and exact_y coordinates. Instead, the x and y values are both exact and randomly chosen. The lines in the file are in no particular order. This file contains the data for the Monte Carlo approximation used in the paper.

    E=0/2N/2001x2001 grid.tsv: As with its counterpart in the folder labeled N, but calculated with N=60000 instead.

    E=-2.495: As with its counterpart E=0, but everything is calculated with E=-2.495123260049612 (which we round to -2.49512326 in the paper). This folder also contains no random or Monte Carlo calculations.

    Code/Multiplier.m: MATLAB code to generate the skew matrix at a given point.

    Code/Iterator.m: MATLAB code to iterate over a series of points and call Multiplier at each.

    Code/Striper.m: MATLAB code to split up the input space into a series of stripes and call Iterator on exactly one of them. We performed our calculations in parallel, each job consisting of calling Striper on a different stripe number.

    Code/Badfinder.m: MATLAB code to take a point and output a series of E-values for which that point is in the bad set.

    Code/BadSetIterator.m: As with Iterator.m, but calls Badfinder.

    Code/BadSetStriper.m: As with Striper.m, but calls BadSetIterator. (The function in this file is also called Striper.)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Guillaume Rousselet; Rand Wilcox (2023). Reaction times and other skewed distributions: problems with the mean and the median [Dataset]. http://doi.org/10.6084/m9.figshare.6911924.v4
Organization logoOrganization logo

Reaction times and other skewed distributions: problems with the mean and the median

Explore at:
pdfAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Guillaume Rousselet; Rand Wilcox
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Reproducibility package for the article:Reaction times and other skewed distributions: problems with the mean and the medianGuillaume A. Rousselet & Rand R. Wilcoxpreprint: https://psyarxiv.com/3y54rdoi: 10.31234/osf.io/3y54rThis package contains all the code and data to reproduce the figures and analyses in the article.

Search
Clear search
Close search
Google apps
Main menu