10 datasets found

d
Code from: Testing for normality in regression models: mistakes abound (but...
search.dataone.org
Updated May 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen Midway; J. Wilson White (2025). Code from: Testing for normality in regression models: mistakes abound (but may not matter) [Dataset]. http://doi.org/10.5061/dryad.sqv9s4nd0
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.sqv9s4nd0
Dataset updated
May 17, 2025
Dataset provided by
Dryad Digital Repository
Authors
Stephen Midway; J. Wilson White
Description
This study examines the misuse of normality tests in linear regression within ecology and biology, focusing on common misconceptions. A bibliometric review found that over 70% of ecology papers and 90% of biology papers incorrectly applied normality tests to raw data instead of model residuals. To assess the impact of this error, we simulated datasets with normal, interval, and skewed distributions across various sample and effect sizes. We compared statistical power between two approaches: testing the whole dataset for normality (incorrect) versus testing model residuals (correct) to determine whether to use a parametric (t-test) or nonparametric (Mann-Whitney U test) method. Our results showed minimal differences in statistical power between the approaches, even when normality was incorrectly tested on raw data. However, when residuals violated the normality assumption, using the Mann-Whitney U test increased statistical power by 3â€“4%. Overall, the study suggests that, while correctly..., , , # Code for Normality Test Study

https://doi.org/10.5061/dryad.sqv9s4nd0

Description of the data and file structure

The data files include those required to reproduce the analysis in "Itâ€™s OK Not to be Normal: Usage of Normality Tests in Linear Models" by S.R. Midway and J.W. White.

Files and variables

File: Normality_Code.zip

Description:Â Unzips to 5 files. "interval_sims.R", "lognormal_sims.R", and "normal_sims.R" are all R scripts that generate the data used in the study, each based on their respective distribution. "normality_comp.R" is an R script to reproduce the comparison of different tests of normality. "workflows_power.R" is an R script that reproduces the 3 analytical decisions in the manuscript.

Code/software

All code is included in the attached files. All code are R scripts that can be run through the free software R, with the associated libraries that are specified in the scripts.Â

Access information

...,
f
Data from: Diagnostic Testing of Finite Moment Conditions for the...
tandf.figshare.com
txt
Updated Feb 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuya Sasaki; Yulong Wang (2024). Diagnostic Testing of Finite Moment Conditions for the Consistency and Root-N Asymptotic Normality of the GMM and M Estimators [Dataset]. http://doi.org/10.6084/m9.figshare.17257542.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17257542.v2
Dataset updated
Feb 29, 2024
Dataset provided by
Taylor & Francis
Authors
Yuya Sasaki; Yulong Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Common econometric analyses based on point estimates, standard errors, and confidence intervals presume the consistency and the root-n asymptotic normality of the GMM or M estimators. However, their key assumptions that data entail finite moments may not be always satisfied in applications. This article proposes a method of diagnostic testing for these key assumptions with applications to both simulated and real datasets.
Challenging Assumptions of Normality in AES s-Box Configurations Under...
figshare.com
xz
Updated Nov 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Clay Carper (2023). Challenging Assumptions of Normality in AES s-Box Configurations Under Side-Channel Analysis -- Experimental Data [Dataset]. http://doi.org/10.6084/m9.figshare.24650373.v1
Explore at:
xzAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24650373.v1
Dataset updated
Nov 28, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Clay Carper
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
Article Abstract:Power-based Side-Channel Analysis (SCA) began with visual-based examinations and has progressed to utilize data-driven statistical analysis. Two distinct classifications of these methods have emerged over the years; those focused on leakage exploitation and those dedicated to leakage detection. This work primarily focuses on a leakage detection-based schema that utilizes Welch's t-test, known as Test Vector Leakage Assessment (TVLA). Both classes of methods process collected data using statistical frameworks that result in the successful exfiltration of information via SCA. Often, statistical testing used during analysis requires the assumption that collected power consumption data originate from a normal distribution. To date, this assumption has remained largely uncontested. This work seeks to demonstrate that while past studies have assumed the normality of collected power traces, this assumption should be properly evaluated. In order to evaluate this assumption, an implementation of Tiny-AES-c with nine unique substitution-box (s-box) configurations is conducted using TVLA to guide experimental design. By leveraging the complexity of the AES algorithm, a sufficiently diverse and complex dataset was developed. Under this dataset, statistical tests for normality such as the Shapiro-Wilk test and the Kolmogorov-Smirnov test provide significant evidence to reject the null hypothesis that the power consumption data are normally distributed. To address this observation, existing non-parametric equivalents such as the Wilcoxon Signed-Rank Test and the Kruskal-Wallis Test are discussed in relation to currently used parametric tests such as Welch's t-test.
l
Supplementary information for "Bayesian inference general procedures for a...
repository.lboro.ac.uk
pdf
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jie Li; Gary Green; Sarah JA Carr; Peng Liu; Jian Zhang (2025). Supplementary information for "Bayesian inference general procedures for a single-subject test study" [Dataset]. http://doi.org/10.17028/rd.lboro.29266982.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.17028/rd.lboro.29266982.v1
Dataset updated
Jun 9, 2025
Dataset provided by
Loughborough University
Authors
Jie Li; Gary Green; Sarah JA Carr; Peng Liu; Jian Zhang
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Supplementary files for article "Bayesian inference general procedures for a single-subject test study"Abnormality detection in identifying a single-subject which deviates from the majority of a control group dataset is a fundamental problem. Typically, the control group is characterised using standard Normal statistics, and the detection of a single abnormal subject is in that context. However, in many situations, the control group cannot be described by Normal statistics, making standard statistical methods inappropriate. This paper presents a Bayesian Inference General Procedures for A Single-subject Test (BIGPAST) designed to mitigate the effects of skewness under the assumption that the dataset of the control group comes from the skewed Student t distribution. BIGPAST operates under the null hypothesis that the single-subject follows the same distribution as the control group. We assess BIGPAST's performance against other methods through simulation studies. The results demonstrate that BIGPAST is robust against deviations from normality and outperforms the existing approaches in accuracy, nearest to the nominal accuracy 0.95. BIGPAST can reduce model misspecification errors under the skewed Student t assumption by up to 12 times, as demonstrated in Section 3.3. We apply BIGPAST to a Magnetoencephalography (MEG) dataset consisting of an individual with mild traumatic brain injury and an age and gender-matched control group. For example, the previous method failed to detect abnormalities in 8 brain areas, whereas BIGPAST successfully identified them, demonstrating its effectiveness in detecting abnormalities in a single-subject.©The Author(s), CC BY-NC 4.0
b
Experimental results on swimming behaviors of four species of cnidarian...
datacart.bco-dmo.org
bco-dmo.org
csv
Updated Jun 29, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kelly Rakow Sutherland (2016). Experimental results on swimming behaviors of four species of cnidarian hydromedusae at Friday Harbor in 2012 (Jellyfish predation in turbulence project) [Dataset]. https://datacart.bco-dmo.org/dataset/650306
Explore at:
csv(5.90 KB)Available download formats
Dataset updated
Jun 29, 2016
Dataset provided by
Biological and Chemical Data Management Office
Authors
Kelly Rakow Sutherland
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
id, tank, species, NGDR_mean, speed_max, treatment, accel_mean, depth_mean, speed_mean, swimtime_pcent_mean
Measurement technique
Camera
Description
Swimming behaviors of four species of cnidarian hydromedusae (Aequorea victoria, Mitrocoma cellularia, Stomotoca atra, Aglantha digitale) exposed to two flow conditions in a laboratory turbulence generator - still water and turbulent (ε ~10-7 m2 s-3) were examined.

A two-way ANOVA was used to test for significant effects of species, flow level (still and turbulent) and their interaction on swimming behavior parameters, including depth in the tank, observed speed, acceleration, NGDR, and time spent swimming. Raw data that did not meet the assumption of normality were square root transformed. Proportion data (NGDR, and time spent swimming) that did not meet the assumption of normality were arcsine square root transformed, which is appropriate for proportion data (Zar, 1999).

Related Datasets:
HydroSwimParams_N
HydroSwimParams_IndStats
f
Data from: MONITORING TRACTOR PERFORMANCE USING SHEWHART AND EXPONENTIALLY...
scielo.figshare.com
jpeg
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murilo A. Voltarelli; Carla S. S. Paixão; Bruno R. de Oliveira; Eduardo P. Angelo; Rouverson P. da Silva (2023). MONITORING TRACTOR PERFORMANCE USING SHEWHART AND EXPONENTIALLY WEIGHTED MOVING AVERAGE CHARTS [Dataset]. http://doi.org/10.6084/m9.figshare.14279788.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14279788.v1
Dataset updated
Jun 2, 2023
Dataset provided by
SciELO journals
Authors
Murilo A. Voltarelli; Carla S. S. Paixão; Bruno R. de Oliveira; Eduardo P. Angelo; Rouverson P. da Silva
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT Statistical process control has been widely used in agricultural operations for monitoring and improving process quality. This study aims to evaluate the Shewhart and exponentially weighted moving average (EWMA) control charts to monitor the performance of an agricultural tractor–planter set. The design is completely randomized based on the assumptions of statistical process control and comprises two treatments: day and night shift treatments. The data to assess the performance of the tractor–planter set are collected during the day and night shifts and used to evaluate the operating speed, motor rotation, engine oil pressure and water temperature, and hourly fuel consumption. The dataset comprised 40 samples compiled from the frontal monitor column inside a tractor cab. It is concluded that both Shewhart and MMEP/EWMA control charts can be used to evaluate engine performance based on the quality indicator parameters investigated, regardless of the normality assumption of the datasets.
R script and datasets - CCA
figshare.com
txt
Updated May 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chui Pin Leaw; Po Teen Lim; Li Keat Lee (2020). R script and datasets - CCA [Dataset]. http://doi.org/10.6084/m9.figshare.12356519.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12356519.v3
Dataset updated
May 30, 2020
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Chui Pin Leaw; Po Teen Lim; Li Keat Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This folder contained a R script and datasets to perform a Canonical correspondence analysis (CCA) in R. CCA was used to infer the underlying relationship between the benthic harmful dinoflagellate assemblages and benthic substrate characteristics, depths and irradiances. CCA is a constrained multivariate ordination technique that extracts major gradients among combinations of explanatory variables in a dataset and requires samples to be both random and independent. Data for cell abundances were Hellinger-transformed prior to CCA to ensure the data met the statistical assumptions of normality and linearity. The analysis was performed using vegan. The significance of variation in benthic harmful dinoflagellates assemblages explained by the explanatory variables was tested using an ANOVA-like Monte Carlo permutation test as implemented in vegan.
f
Data from: Exact and approximate computation of critical values of the...
tandf.figshare.com
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregory Tai Xiang Ang; Zhidong Bai; Kwok Pui Choi; Yasunori Fujikoshi; Jiang Hu (2023). Exact and approximate computation of critical values of the largest root test in high dimension [Dataset]. http://doi.org/10.6084/m9.figshare.14274919.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14274919.v1
Dataset updated
May 31, 2023
Dataset provided by
Taylor & Francis
Authors
Gregory Tai Xiang Ang; Zhidong Bai; Kwok Pui Choi; Yasunori Fujikoshi; Jiang Hu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The difficulty to efficiently compute the null distribution of the largest eigenvalue of a MANOVA matrix has hindered the wider applicability of Roy’s Largest Root Test (RLRT) though it was proposed over six decades ago. Recent progress made by Johnstone, Butler and Paige and Chiani has greatly simplified the approximate and exact computation of the critical values of RLRT. When datasets are high dimensional (HD), Chiani’s numerical algorithm of exact computation may not give reliable results due to truncation error, and Johnstone’s approximation method via Tracy-Widom distribution is likely to give a good approximation. In this paper, we conduct comparative studies to study in which region the exact method gives reliable numerical values, and in which region Johnstone’s method gives a good quality approximation. We formulate recommendations to inform practitioners of RLRT. We also conduct simulation studies in the high dimensional setting to examine the robustness of RLRT against normality assumption in populations. Our study provides support of RLRT robustness against non-normality in HD.
f
Power and Sample Size Determination in the Rasch Model: Evaluation of the...
plos.figshare.com
doc
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alice Guilleux; Myriam Blanchin; Jean-Benoit Hardouin; Véronique Sébille (2023). Power and Sample Size Determination in the Rasch Model: Evaluation of the Robustness of a Numerical Method to Non-Normality of the Latent Trait [Dataset]. http://doi.org/10.1371/journal.pone.0083652
Explore at:
docAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0083652
Dataset updated
May 30, 2023
Dataset provided by
PLOS ONE
Authors
Alice Guilleux; Myriam Blanchin; Jean-Benoit Hardouin; Véronique Sébille
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Patient-reported outcomes (PRO) have gained importance in clinical and epidemiological research and aim at assessing quality of life, anxiety or fatigue for instance. Item Response Theory (IRT) models are increasingly used to validate and analyse PRO. Such models relate observed variables to a latent variable (unobservable variable) which is commonly assumed to be normally distributed. A priori sample size determination is important to obtain adequately powered studies to determine clinically important changes in PRO. In previous developments, the Raschpower method has been proposed for the determination of the power of the test of group effect for the comparison of PRO in cross-sectional studies with an IRT model, the Rasch model. The objective of this work was to evaluate the robustness of this method (which assumes a normal distribution for the latent variable) to violations of distributional assumption. The statistical power of the test of group effect was estimated by the empirical rejection rate in data sets simulated using a non-normally distributed latent variable. It was compared to the power obtained with the Raschpower method. In both cases, the data were analyzed using a latent regression Rasch model including a binary covariate for group effect. For all situations, both methods gave comparable results whatever the deviations from the model assumptions. Given the results, the Raschpower method seems to be robust to the non-normality of the latent trait for determining the power of the test of group effect.
Music and Simulated Urban Driving Dataset
figshare.com
bin
Updated Apr 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Costas Karageorghis (2021). Music and Simulated Urban Driving Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.13603334.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13603334.v1
Dataset updated
Apr 30, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Costas Karageorghis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Checks for univariate outliers were conducted using standardised scores (z > ± 3.29) and for multivariate outliers using the Mahalanobis distance test (p < .001; Tabachnick & Fidell, 2019). Parametric assumptions that underlie within-subjects ANOVA (Tabachnick & Fidell, 2019) were assessed (e.g., Q-Q plots and the Shapiro–Wilk test for normality). Initial analyses for the psychological measures (i.e., RSME, NASA-TLX and Affect Grid) were conducted using mixed-model Condition × Personality (M)ANOVAs and significant F tests were followed up with pairwise/multiple comparisons. Where the assumption of sphericity was violated, Greenhouse–Geisser-adjusted F tests were used.

Three types of behavioural data were acquired from the urban driving simulation: (a) a risk-rating (on the scale from 1 [safe driving] to 4 [reckless driving]) derived from video data (without any audible sound) and pertaining to driving performance in the entire trial. Three members of the research team conducted the ratings and inter-rater reliabilities were computed; (b) course completion time (min); (c) mean speed (mph), and (d) accelerator and brake pedal positions (i.e., 0 = no pressure applied, 1 = maximum braking).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Stephen Midway; J. Wilson White (2025). Code from: Testing for normality in regression models: mistakes abound (but may not matter) [Dataset]. http://doi.org/10.5061/dryad.sqv9s4nd0

Code from: Testing for normality in regression models: mistakes abound (but may not matter)

Explore at:

Unique identifier

https://doi.org/10.5061/dryad.sqv9s4nd0

Dataset updated

May 17, 2025

Dataset provided by

Dryad Digital Repository

Authors

Stephen Midway; J. Wilson White

Description

This study examines the misuse of normality tests in linear regression within ecology and biology, focusing on common misconceptions. A bibliometric review found that over 70% of ecology papers and 90% of biology papers incorrectly applied normality tests to raw data instead of model residuals. To assess the impact of this error, we simulated datasets with normal, interval, and skewed distributions across various sample and effect sizes. We compared statistical power between two approaches: testing the whole dataset for normality (incorrect) versus testing model residuals (correct) to determine whether to use a parametric (t-test) or nonparametric (Mann-Whitney U test) method. Our results showed minimal differences in statistical power between the approaches, even when normality was incorrectly tested on raw data. However, when residuals violated the normality assumption, using the Mann-Whitney U test increased statistical power by 3â€“4%. Overall, the study suggests that, while correctly..., , , # Code for Normality Test Study

https://doi.org/10.5061/dryad.sqv9s4nd0

Description of the data and file structure

The data files include those required to reproduce the analysis in "Itâ€™s OK Not to be Normal: Usage of Normality Tests in Linear Models" by S.R. Midway and J.W. White.

Files and variables

File: Normality_Code.zip

Description:Â Unzips to 5 files. "interval_sims.R", "lognormal_sims.R", and "normal_sims.R" are all R scripts that generate the data used in the study, each based on their respective distribution. "normality_comp.R" is an R script to reproduce the comparison of different tests of normality. "workflows_power.R" is an R script that reproduces the 3 analytical decisions in the manuscript.

Code/software

All code is included in the attached files. All code are R scripts that can be run through the free software R, with the associated libraries that are specified in the scripts.Â

Access information

...,

Clear search

Close search

Google apps

Main menu

Code from: Testing for normality in regression models: mistakes abound (but...

Description of the data and file structure

Files and variables

File: Normality_Code.zip

Code/software

Access information

Data from: Diagnostic Testing of Finite Moment Conditions for the...

Challenging Assumptions of Normality in AES s-Box Configurations Under...

Supplementary information for "Bayesian inference general procedures for a...

Experimental results on swimming behaviors of four species of cnidarian...

Data from: MONITORING TRACTOR PERFORMANCE USING SHEWHART AND EXPONENTIALLY...

R script and datasets - CCA

Data from: Exact and approximate computation of critical values of the...

Power and Sample Size Determination in the Rasch Model: Evaluation of the...

Music and Simulated Urban Driving Dataset

Code from: Testing for normality in regression models: mistakes abound (but may not matter)

Description of the data and file structure

Files and variables

File: Normality_Code.zip

Code/software

Access information