MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
R Scripts contain statistical data analisys for streamflow and sediment data, including Flow Duration Curves, Double Mass Analysis, Nonlinear Regression Analysis for Suspended Sediment Rating Curves, Stationarity Tests and include several plots.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
S4 Table. Box plot and the statistical analysis for the diameters measured for the NCLPs obtained by AFM.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
###############################################################################
### Source code and problem instances:
###############################################################################
- The source code of METAFOR will be made available on github once the paper has been accepted for publication. In the meantime, the code is provided in METAFOR.zip
- The list of problem instances (single objective continuous functions) are provided in the attached file "instances.zip"
###############################################################################
###############################################################################
### Experiment folders:
###############################################################################
The folders "default", "leave25OUT", "leave25OUTCEC14", "leaveLDO" and "leaveLDOCEC14" (compressed in zip format to save space) contain all the data collected during the experiment. Each of them contains the following folders/files:
- "candidates/candidates.txt" -- a text file with the algorithms specified as command lines,
- "candidates/OUTPUT" -- a folder containing a text file with the best solutions found by each algorithm for each problem instance. The names of the files in the folder are composed of the test suite "c_<0,1,2,3>", the number of the function in the test suite "f_<0,...,n>", and the number of dimensions "d_<50,100,500,750,...,d>".
- "DataAndPlots" -- a folder created automatically by the "plot_bxp_rtd_wlx.sh" processing script (see below).
Inside this folder are the following subfolders:
- "DataAndPlots/Bxp" -- stores the box plots created based on the data in "DataAndPlots/Data";
- "DataAndPlots/Cvg" -- (if any)stores the convergence plots generated based on the data in "OUTPUT_processed";
- "DataAndPlots/Data" -- stores the processed data and statistical information (median, median error, statistical test, etc.) of the raw data stored in "candidates/OUTPUT";
- "DataAndPlots/Time" -- stores the average time taken by the algorithms on each problem instance.
- "DataAndPlots/Table" -- (if any) stores, in plain text and pseudo-LaTeX format, the tables of results reported in the paper, i.e., median, median error, median absolute deviation, rankings, statistical tests, and number of wins.
- "OUTPUT" -- stores the convergence data of the algorithms (i.e. function evaluations vs. solution quality).
In the latest version of METAFOR, each convergence file consists of 100 points; however, in the version we used for the experiments reported in the paper, each file consists of thousands of points per algorithm, making this folder particularly heavy.
The data in folder "candidates/OUTPUT" and "OUTPUT" is gathered via the script "runMe.sh" indicating an experiment folder and instances file, namely:
- in the case of "default", we solved instances "test_MIXTURE_max200.txt" and "test_MIXTURE_onlyLargeScale.txt";
- in the case of "leave25OUT", we solved instances "test_MIXTURE_max200.txt", "test_MIXTURE_onlyLargeScale.txt" and "test_MIXTURE_onlyLargeScaleDND.txt";
- in the case of "leave25OUTCEC14", we solved instances "test_CEC14.txt";
- in the case of "leaveLDO", we solved instances "test_MIXTURE_max200.txt", "test_MIXTURE_onlyLargeScale.txt" and "test_MIXTURE_onlyLargeScaleDND.txt";
- in the case of "leaveLDOCEC14", we solved instances "test_CEC14.txt".
###############################################################################
###############################################################################
### Processing scripts:
###############################################################################
The scripts folder (scripts.zip) contains the main processing script "plot_bxp_rtd_wlx.sh" and several auxiliary R and shell scripts: "boxplot.R", "cvg_log.R", "wilcoxon.R", "ranksPerClass.R", "filter_repeating.sh", "full_outer_join.sh", and "replace_na.sh". All the auxiliary scripts are automatically called by the main script, depending on the options specified by the user, and are intended to be used standalone. The R axiliary scripts are used to generate the box plots ("boxplot.R") and convergence plots ("cvg_log.R"), to perform the statistical test ("wilcoxon.R"), and to compute the rankings ("ranksPerClass.R"). The auxiliary shell scripts are used to clean the raw data stored in "OUTPUT" and create a file called data-*-mean.txt for each data file, which can be entered into "cvg_log.R" to generate the convergence plots.
###############################################################################
###############################################################################
### Folders with experiments:
###############################################################################
Since in the paper we report different sets of algorithms solving different sets of problems. We created a folder (compressed in zip format for space reasons) for each of them and put in it only the specific data that we want to analyze and plot. The data inside these folders is simply copy-paste from the main experiment folders, and it is as follows:
- "METAFOR/exp1_dftVStuned" contains the results discussed in section 5.3.1 of the paper.
- "METAFOR/exp2_mtfVSHyb" contains the results discussed in section 5.3.2 of the paper.
- "METAFOR/exp3_CEC14" and "METAFOR/exp4_LS" contain the results discussed in section 5.3.3 of the paper.
###############################################################################
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The SpeedMap analyzes hundreds of thousands of Internet speed test measurements by tarife.at as well as the RTR network test and then gives non-binding estimates of which speeds providers reach at a certain address. In addition, the raw data is visualized in the form of a statistical box plot.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.
50 years box plot experiment in Grossbeeren (1972 - 2022) - Test components (for statistical evaluation).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Graphs are powerful and versatile data structures that can be used to represent a wide range of different types of information. In this paper, we introduce a method to analyze and then visualize an important class of data described over a graph—namely, ensembles of paths. Analysis of such path ensembles is useful in a variety of applications, in diverse fields such as transportation, computer networks, and molecular dynamics. The proposed method generalizes the concept of band depth to an ensemble of paths on a graph, which provides an center-outward ordering on the paths. This ordering is, in turn, used to construct a generalization of the conventional boxplot or whisker plot, called a path boxplot, which applies to paths on a graph. The utility of path boxplot is demonstrated for several examples of path ensembles including paths defined over computer networks and roads.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The First Wifi-Based Localisation/Positioning Datasets for No-GPS Open Areas Using Smart Bins. There are two directories:
datasets It contains two main types of datasets:
1- Fingerprint dataset fingerprint.csv contains four users who generated fingerprints using their mobile devices.
2- APs dataset APs.csv: a huge dataset contains auto-generate rss reported by APs. APs_users_date_time_label.csv: it contains a labelled APs_four_users_label.csv for four users only.
scripts This directory contains all jupyter notebooks used to create datasets and provide statistical analysis (normalisation, t-test, etc.) and visualisations (historgrams, box-plots, etc_
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT: This paper shows how to apply the lattice package of R to create effective scientific graphs. The readers will learn basic notions of the package and ways to work with it in an easy way. The R code the paper provides will help them create various graphs, including a scatter plot, a box plot, a density plot, and a bar plot; with a little work, the code can be changed to make other graphs. The paper emphasizes the trellis display, a useful but still undervalued technique in scientific visualization.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead of
urban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This code permits to plot box and whiskers plots to evaluate the statistical distribution of radiogenic Neodymium and Strontium isotope values. The particular application is to fingerprint Potential Source Areas for dust generation in North Africa.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1H = Haematology, Cchem = clinical biochemistry, U = Urine analysis, ROWt = relative organ weight.2Sprague-Dawley.3Sprague-Dawley (Charles River).Papers analysed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Effective communication of radiation exposure data is essential for improving safety management practices for radiological workers. However, traditional tabular formats used in reporting radiation exposure data often fail to convey critical patterns and trends, making it difficult for non-experts to interpret and act on the information. This study evaluates the application of data visualization techniques, including radar charts, box plots, sparklines, and Chernoff faces, to enhance the accessibility and comprehension of radiation exposure data. Using datasets from the “2022 Annual Report on Individual Exposure Doses of Radiological Workers” published by the KDCA, this study demonstrates how visualization can effectively highlight disparities across professions, demographic groups, and geographic regions. The findings underscore the significant potential of visualization methods in simplifying complex datasets, enabling stakeholders to make more informed decisions. Nonetheless, the study has limitations, including its reliance on pre-existing public datasets and a lack of real-time or granular data. Future research should focus on collecting primary data to explore causal relationships in radiation exposure trends and on applying advanced statistical and machine learning techniques to uncover deeper insights. By integrating robust visualization methods, this study aims to bridge the gap between raw data and actionable knowledge, ultimately contributing to safer occupational environments for radiological workers.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This study presents an interdisciplinary laboratory exercise that integrates STEAM (Science, Technology, Engineering, Arts, and Mathematics) education to evaluate vitamin C content in commercial supplements. Students employed three titration methods: direct titration with iodine, direct titration with potassium iodate, and back-titration with sodium thiosulfate. The activity emphasizes the development of essential analytical chemistry skills and the application of statistical techniques using software tools such as JASP and JAMOVI. Through this exercise, students learn to assess data normality, select appropriate hypothesis tests (e.g., ANOVA, Kruskal–Wallis, RMANOVA, Friedman test), and interpret results using visualizations such as box plots and raincloud plots. This multidisciplinary approach not only enhances students’ understanding of chemical analysis but also refines their statistical interpretation abilities, and preparing them for real-world scientific research and industry applications. The activity underscores the value of STEAM education in fostering critical thinking and problem-solving skills by bridging chemistry and statistics in a meaningful context.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of result on welded beam design problem.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of result on speed reducer design problem.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of result on three-bar truss design problem.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The results of Wilcoxon rank sum test on CEC2017 functions with D=30.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The parameters of the algorithms.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
R Scripts contain statistical data analisys for streamflow and sediment data, including Flow Duration Curves, Double Mass Analysis, Nonlinear Regression Analysis for Suspended Sediment Rating Curves, Stationarity Tests and include several plots.