10 datasets found
  1. d

    Code from: Testing for normality in regression models: mistakes abound (but...

    • search.dataone.org
    Updated May 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen Midway; J. Wilson White (2025). Code from: Testing for normality in regression models: mistakes abound (but may not matter) [Dataset]. http://doi.org/10.5061/dryad.sqv9s4nd0
    Explore at:
    Dataset updated
    May 17, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Stephen Midway; J. Wilson White
    Description

    This study examines the misuse of normality tests in linear regression within ecology and biology, focusing on common misconceptions. A bibliometric review found that over 70% of ecology papers and 90% of biology papers incorrectly applied normality tests to raw data instead of model residuals. To assess the impact of this error, we simulated datasets with normal, interval, and skewed distributions across various sample and effect sizes. We compared statistical power between two approaches: testing the whole dataset for normality (incorrect) versus testing model residuals (correct) to determine whether to use a parametric (t-test) or nonparametric (Mann-Whitney U test) method. Our results showed minimal differences in statistical power between the approaches, even when normality was incorrectly tested on raw data. However, when residuals violated the normality assumption, using the Mann-Whitney U test increased statistical power by 3–4%. Overall, the study suggests that, while correctly..., , , # Code for Normality Test Study

    https://doi.org/10.5061/dryad.sqv9s4nd0

    Description of the data and file structure

    The data files include those required to reproduce the analysis in "It’s OK Not to be Normal: Usage of Normality Tests in Linear Models" by S.R. Midway and J.W. White.

    Files and variables

    File: Normality_Code.zip

    Description:Â Unzips to 5 files. "interval_sims.R", "lognormal_sims.R", and "normal_sims.R" are all R scripts that generate the data used in the study, each based on their respective distribution. "normality_comp.R" is an R script to reproduce the comparison of different tests of normality. "workflows_power.R" is an R script that reproduces the 3 analytical decisions in the manuscript.

    Code/software

    All code is included in the attached files. All code are R scripts that can be run through the free software R, with the associated libraries that are specified in the scripts.Â

    Access information

    ...,

  2. f

    Data from: Diagnostic Testing of Finite Moment Conditions for the...

    • tandf.figshare.com
    txt
    Updated Feb 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuya Sasaki; Yulong Wang (2024). Diagnostic Testing of Finite Moment Conditions for the Consistency and Root-N Asymptotic Normality of the GMM and M Estimators [Dataset]. http://doi.org/10.6084/m9.figshare.17257542.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 29, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Yuya Sasaki; Yulong Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Common econometric analyses based on point estimates, standard errors, and confidence intervals presume the consistency and the root-n asymptotic normality of the GMM or M estimators. However, their key assumptions that data entail finite moments may not be always satisfied in applications. This article proposes a method of diagnostic testing for these key assumptions with applications to both simulated and real datasets.

  3. Challenging Assumptions of Normality in AES s-Box Configurations Under...

    • figshare.com
    xz
    Updated Nov 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clay Carper (2023). Challenging Assumptions of Normality in AES s-Box Configurations Under Side-Channel Analysis -- Experimental Data [Dataset]. http://doi.org/10.6084/m9.figshare.24650373.v1
    Explore at:
    xzAvailable download formats
    Dataset updated
    Nov 28, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Clay Carper
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    Article Abstract:Power-based Side-Channel Analysis (SCA) began with visual-based examinations and has progressed to utilize data-driven statistical analysis. Two distinct classifications of these methods have emerged over the years; those focused on leakage exploitation and those dedicated to leakage detection. This work primarily focuses on a leakage detection-based schema that utilizes Welch's t-test, known as Test Vector Leakage Assessment (TVLA). Both classes of methods process collected data using statistical frameworks that result in the successful exfiltration of information via SCA. Often, statistical testing used during analysis requires the assumption that collected power consumption data originate from a normal distribution. To date, this assumption has remained largely uncontested. This work seeks to demonstrate that while past studies have assumed the normality of collected power traces, this assumption should be properly evaluated. In order to evaluate this assumption, an implementation of Tiny-AES-c with nine unique substitution-box (s-box) configurations is conducted using TVLA to guide experimental design. By leveraging the complexity of the AES algorithm, a sufficiently diverse and complex dataset was developed. Under this dataset, statistical tests for normality such as the Shapiro-Wilk test and the Kolmogorov-Smirnov test provide significant evidence to reject the null hypothesis that the power consumption data are normally distributed. To address this observation, existing non-parametric equivalents such as the Wilcoxon Signed-Rank Test and the Kruskal-Wallis Test are discussed in relation to currently used parametric tests such as Welch's t-test.

  4. l

    Supplementary information for "Bayesian inference general procedures for a...

    • repository.lboro.ac.uk
    pdf
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jie Li; Gary Green; Sarah JA Carr; Peng Liu; Jian Zhang (2025). Supplementary information for "Bayesian inference general procedures for a single-subject test study" [Dataset]. http://doi.org/10.17028/rd.lboro.29266982.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    Loughborough University
    Authors
    Jie Li; Gary Green; Sarah JA Carr; Peng Liu; Jian Zhang
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Supplementary files for article "Bayesian inference general procedures for a single-subject test study"Abnormality detection in identifying a single-subject which deviates from the majority of a control group dataset is a fundamental problem. Typically, the control group is characterised using standard Normal statistics, and the detection of a single abnormal subject is in that context. However, in many situations, the control group cannot be described by Normal statistics, making standard statistical methods inappropriate. This paper presents a Bayesian Inference General Procedures for A Single-subject Test (BIGPAST) designed to mitigate the effects of skewness under the assumption that the dataset of the control group comes from the skewed Student t distribution. BIGPAST operates under the null hypothesis that the single-subject follows the same distribution as the control group. We assess BIGPAST's performance against other methods through simulation studies. The results demonstrate that BIGPAST is robust against deviations from normality and outperforms the existing approaches in accuracy, nearest to the nominal accuracy 0.95. BIGPAST can reduce model misspecification errors under the skewed Student t assumption by up to 12 times, as demonstrated in Section 3.3. We apply BIGPAST to a Magnetoencephalography (MEG) dataset consisting of an individual with mild traumatic brain injury and an age and gender-matched control group. For example, the previous method failed to detect abnormalities in 8 brain areas, whereas BIGPAST successfully identified them, demonstrating its effectiveness in detecting abnormalities in a single-subject.©The Author(s), CC BY-NC 4.0

  5. b

    Experimental results on swimming behaviors of four species of cnidarian...

    • datacart.bco-dmo.org
    • bco-dmo.org
    csv
    Updated Jun 29, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kelly Rakow Sutherland (2016). Experimental results on swimming behaviors of four species of cnidarian hydromedusae at Friday Harbor in 2012 (Jellyfish predation in turbulence project) [Dataset]. https://datacart.bco-dmo.org/dataset/650306
    Explore at:
    csv(5.90 KB)Available download formats
    Dataset updated
    Jun 29, 2016
    Dataset provided by
    Biological and Chemical Data Management Office
    Authors
    Kelly Rakow Sutherland
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    id, tank, species, NGDR_mean, speed_max, treatment, accel_mean, depth_mean, speed_mean, swimtime_pcent_mean
    Measurement technique
    Camera
    Description

    Swimming behaviors of four species of cnidarian hydromedusae (Aequorea victoria, Mitrocoma cellularia, Stomotoca atra, Aglantha digitale) exposed to two flow conditions in a laboratory turbulence generator - still water and turbulent (ε ~10-7 m2 s-3) were examined.

    A two-way ANOVA was used to test for significant effects of species, flow level (still and turbulent) and their interaction on swimming behavior parameters, including depth in the tank, observed speed, acceleration, NGDR, and time spent swimming. Raw data that did not meet the assumption of normality were square root transformed. Proportion data (NGDR, and time spent swimming) that did not meet the assumption of normality were arcsine square root transformed, which is appropriate for proportion data (Zar, 1999).

    Related Datasets:
    HydroSwimParams_N
    HydroSwimParams_IndStats

  6. f

    Data from: MONITORING TRACTOR PERFORMANCE USING SHEWHART AND EXPONENTIALLY...

    • scielo.figshare.com
    jpeg
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murilo A. Voltarelli; Carla S. S. Paixão; Bruno R. de Oliveira; Eduardo P. Angelo; Rouverson P. da Silva (2023). MONITORING TRACTOR PERFORMANCE USING SHEWHART AND EXPONENTIALLY WEIGHTED MOVING AVERAGE CHARTS [Dataset]. http://doi.org/10.6084/m9.figshare.14279788.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    SciELO journals
    Authors
    Murilo A. Voltarelli; Carla S. S. Paixão; Bruno R. de Oliveira; Eduardo P. Angelo; Rouverson P. da Silva
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT Statistical process control has been widely used in agricultural operations for monitoring and improving process quality. This study aims to evaluate the Shewhart and exponentially weighted moving average (EWMA) control charts to monitor the performance of an agricultural tractor–planter set. The design is completely randomized based on the assumptions of statistical process control and comprises two treatments: day and night shift treatments. The data to assess the performance of the tractor–planter set are collected during the day and night shifts and used to evaluate the operating speed, motor rotation, engine oil pressure and water temperature, and hourly fuel consumption. The dataset comprised 40 samples compiled from the frontal monitor column inside a tractor cab. It is concluded that both Shewhart and MMEP/EWMA control charts can be used to evaluate engine performance based on the quality indicator parameters investigated, regardless of the normality assumption of the datasets.

  7. R script and datasets - CCA

    • figshare.com
    txt
    Updated May 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chui Pin Leaw; Po Teen Lim; Li Keat Lee (2020). R script and datasets - CCA [Dataset]. http://doi.org/10.6084/m9.figshare.12356519.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Chui Pin Leaw; Po Teen Lim; Li Keat Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contained a R script and datasets to perform a Canonical correspondence analysis (CCA) in R. CCA was used to infer the underlying relationship between the benthic harmful dinoflagellate assemblages and benthic substrate characteristics, depths and irradiances. CCA is a constrained multivariate ordination technique that extracts major gradients among combinations of explanatory variables in a dataset and requires samples to be both random and independent. Data for cell abundances were Hellinger-transformed prior to CCA to ensure the data met the statistical assumptions of normality and linearity. The analysis was performed using vegan. The significance of variation in benthic harmful dinoflagellates assemblages explained by the explanatory variables was tested using an ANOVA-like Monte Carlo permutation test as implemented in vegan.

  8. f

    Data from: Exact and approximate computation of critical values of the...

    • tandf.figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gregory Tai Xiang Ang; Zhidong Bai; Kwok Pui Choi; Yasunori Fujikoshi; Jiang Hu (2023). Exact and approximate computation of critical values of the largest root test in high dimension [Dataset]. http://doi.org/10.6084/m9.figshare.14274919.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Gregory Tai Xiang Ang; Zhidong Bai; Kwok Pui Choi; Yasunori Fujikoshi; Jiang Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The difficulty to efficiently compute the null distribution of the largest eigenvalue of a MANOVA matrix has hindered the wider applicability of Roy’s Largest Root Test (RLRT) though it was proposed over six decades ago. Recent progress made by Johnstone, Butler and Paige and Chiani has greatly simplified the approximate and exact computation of the critical values of RLRT. When datasets are high dimensional (HD), Chiani’s numerical algorithm of exact computation may not give reliable results due to truncation error, and Johnstone’s approximation method via Tracy-Widom distribution is likely to give a good approximation. In this paper, we conduct comparative studies to study in which region the exact method gives reliable numerical values, and in which region Johnstone’s method gives a good quality approximation. We formulate recommendations to inform practitioners of RLRT. We also conduct simulation studies in the high dimensional setting to examine the robustness of RLRT against normality assumption in populations. Our study provides support of RLRT robustness against non-normality in HD.

  9. f

    Power and Sample Size Determination in the Rasch Model: Evaluation of the...

    • plos.figshare.com
    doc
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alice Guilleux; Myriam Blanchin; Jean-Benoit Hardouin; Véronique Sébille (2023). Power and Sample Size Determination in the Rasch Model: Evaluation of the Robustness of a Numerical Method to Non-Normality of the Latent Trait [Dataset]. http://doi.org/10.1371/journal.pone.0083652
    Explore at:
    docAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alice Guilleux; Myriam Blanchin; Jean-Benoit Hardouin; Véronique Sébille
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Patient-reported outcomes (PRO) have gained importance in clinical and epidemiological research and aim at assessing quality of life, anxiety or fatigue for instance. Item Response Theory (IRT) models are increasingly used to validate and analyse PRO. Such models relate observed variables to a latent variable (unobservable variable) which is commonly assumed to be normally distributed. A priori sample size determination is important to obtain adequately powered studies to determine clinically important changes in PRO. In previous developments, the Raschpower method has been proposed for the determination of the power of the test of group effect for the comparison of PRO in cross-sectional studies with an IRT model, the Rasch model. The objective of this work was to evaluate the robustness of this method (which assumes a normal distribution for the latent variable) to violations of distributional assumption. The statistical power of the test of group effect was estimated by the empirical rejection rate in data sets simulated using a non-normally distributed latent variable. It was compared to the power obtained with the Raschpower method. In both cases, the data were analyzed using a latent regression Rasch model including a binary covariate for group effect. For all situations, both methods gave comparable results whatever the deviations from the model assumptions. Given the results, the Raschpower method seems to be robust to the non-normality of the latent trait for determining the power of the test of group effect.

  10. Music and Simulated Urban Driving Dataset

    • figshare.com
    bin
    Updated Apr 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Costas Karageorghis (2021). Music and Simulated Urban Driving Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.13603334.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 30, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Costas Karageorghis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Checks for univariate outliers were conducted using standardised scores (z > ± 3.29) and for multivariate outliers using the Mahalanobis distance test (p < .001; Tabachnick & Fidell, 2019). Parametric assumptions that underlie within-subjects ANOVA (Tabachnick & Fidell, 2019) were assessed (e.g., Q-Q plots and the Shapiro–Wilk test for normality). Initial analyses for the psychological measures (i.e., RSME, NASA-TLX and Affect Grid) were conducted using mixed-model Condition × Personality (M)ANOVAs and significant F tests were followed up with pairwise/multiple comparisons. Where the assumption of sphericity was violated, Greenhouse–Geisser-adjusted F tests were used.

    Three types of behavioural data were acquired from the urban driving simulation: (a) a risk-rating (on the scale from 1 [safe driving] to 4 [reckless driving]) derived from video data (without any audible sound) and pertaining to driving performance in the entire trial. Three members of the research team conducted the ratings and inter-rater reliabilities were computed; (b) course completion time (min); (c) mean speed (mph), and (d) accelerator and brake pedal positions (i.e., 0 = no pressure applied, 1 = maximum braking).

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stephen Midway; J. Wilson White (2025). Code from: Testing for normality in regression models: mistakes abound (but may not matter) [Dataset]. http://doi.org/10.5061/dryad.sqv9s4nd0

Code from: Testing for normality in regression models: mistakes abound (but may not matter)

Explore at:
Dataset updated
May 17, 2025
Dataset provided by
Dryad Digital Repository
Authors
Stephen Midway; J. Wilson White
Description

This study examines the misuse of normality tests in linear regression within ecology and biology, focusing on common misconceptions. A bibliometric review found that over 70% of ecology papers and 90% of biology papers incorrectly applied normality tests to raw data instead of model residuals. To assess the impact of this error, we simulated datasets with normal, interval, and skewed distributions across various sample and effect sizes. We compared statistical power between two approaches: testing the whole dataset for normality (incorrect) versus testing model residuals (correct) to determine whether to use a parametric (t-test) or nonparametric (Mann-Whitney U test) method. Our results showed minimal differences in statistical power between the approaches, even when normality was incorrectly tested on raw data. However, when residuals violated the normality assumption, using the Mann-Whitney U test increased statistical power by 3–4%. Overall, the study suggests that, while correctly..., , , # Code for Normality Test Study

https://doi.org/10.5061/dryad.sqv9s4nd0

Description of the data and file structure

The data files include those required to reproduce the analysis in "It’s OK Not to be Normal: Usage of Normality Tests in Linear Models" by S.R. Midway and J.W. White.

Files and variables

File: Normality_Code.zip

Description:Â Unzips to 5 files. "interval_sims.R", "lognormal_sims.R", and "normal_sims.R" are all R scripts that generate the data used in the study, each based on their respective distribution. "normality_comp.R" is an R script to reproduce the comparison of different tests of normality. "workflows_power.R" is an R script that reproduces the 3 analytical decisions in the manuscript.

Code/software

All code is included in the attached files. All code are R scripts that can be run through the free software R, with the associated libraries that are specified in the scripts.Â

Access information

...,

Search
Clear search
Close search
Google apps
Main menu