Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
Replication Package for "A Study on the Pythonic Functional Constructs' Understandability" to appear at ICSE 2024
Authors: Cyrine Zid, Fiorella Zampetti, Giuliano Antoniol, Massimiliano Di penta
Article Preprint: https://mdipenta.github.io/files/ICSE24_funcExperiment.pdf
Artifacts: https://doi.org/10.5281/zenodo.8191782
License: GPL V3.0
This package contains folders and files with code and data used in the study described in the paper. In the following, we first provide all fields required for the submission, and then report a detailed description of all repository folders.
Artifact Description
Purpose
The artifact is about a controlled experiment aimed at investigating the extent to which Pythonic functional constructs have an impact on source code understandability. The artifact archive contains:
The material to allow replicating the study (see Section Experimental-Material)
Raw quantitative results, working datasets, and scripts to replicate the statistical analyses reported in the paper. Specifically, the executable part of the replication package reproduces figures and tables of the quantitative analysis (RQ1 and RQ2) of the paper starting from the working datasets.
Spreadsheets used for the qualitative analysis (RQ3).
We apply for the following badges:
Available and reusable: because we provide all the material that can be used to replicate the experiment, but also to perform the statistical analyses and the qualitative analyses (spreadsheets, in this case)
Provenance
Paper preprint link: https://mdipenta.github.io/files/ICSE24_funcExperiment.pdf
Artifacts: https://doi.org/10.5281/zenodo.8191782
Data
Results have been obtained by conducting the controlled experiment involving Prolificworkers as participants. Data collection and processing followed a protocol approved by the University ethical board. Note that all data enclosed in the artifact is completely anonymized and does not contain sensible information.
Further details about the provided dataset can be found in the Section Results' directory and files
Setup and Usage (for executable artifacts):
See the Section Scripts to reproduce the results, and instructions for running them
Experiment-Material/
Contains the material used for the experiment, and, specifically, the following subdirectories:
Google-Forms/
Contains (as PDF documents) the questionnaires submitted to the ten experimental groups.
Task-Sources/
Contains, for each experimental group (G-1...G-10), the sources used to produce the Google Forms, and, specifically: - The cover letter (Letter.docx). - A directory for each experimental task (Lambda 1, Lambda 2, Comp 1, Comp 2, MRF 1, MRF 2, Lambda Comparison, Comp Comparison, MRF Comparison). Each directory contains: (i) the exercise text (in both Word and .txt format), the source code snippet, and its .png image to be used in the form. Note: the "Comparison" tasks do not have any exercise as the purpose is always the same, i.e., to compare the (perceived) understandability of the snippets and return the results of the comparison.
Code-Examples-Table1/
Contains the source code snippets used as objects of the study (the same you can find under "Task-Sources/"), named as reported in Table 1.
Results' directory and files
raw-responses/
Contains, as spreadsheets, the raw responses provided by the study participants through Google forms.
raw-results-RQ1/
Contains the raw results for RQ1. Specifically, the directory contains a subdirectory for each group (G1-G10). Each subdirectory contains: - For each user (named using their Prolific IDs, a directory containing, for each question (Q1-Q6) the produced python code (Qn.py) its output (QnR.txt) and its StdErr output (QnErr.txt). - "expected-outputs/": A directory containing the expected outputs for each task (Qn.txt).
working-results/RQ1-RQ2-files-for-statistical-analysis/
Contains three .csv files used as input for conducting the statistical analysis and drawing the graphs for addressing the first two research questions of the study. Specifically:
ConstructUsage.csv contains the declared frequency usage of the three functional constructs object of the study. This file is used to draw Figure 4. The file contains an entry for each participant, reporting the (text-coded) frequency of construct usage for Comprehension, Lambda, and MRF.
RQ1.csv contains the collected data used for the mixed-effect logistic regression relating the use of functional constructs with the correctness of the change task, as well as the logistic regression relating the use of map/reduce/filter functions with the correctness of the change task. The csv file contains an entry for each answer provided by each subject, and features the following columns:
Group: experimental group to which the participant is assigned
User: user ID
Time: task time in seconds
Approvals: number of approvals on previous tasks performed on Prolific
Student: whether the participant declared themselves as a student
Section: section of the questionnaire (lambda, comp, or mrf)
Construct: specific construct being presented (same as "Section" for lambda and comp, for mrf it says whether it is a map, reduce, or filter)
Question: question id, from Q1 to Q6, indicate the ordering of the question
MainFactor: main factor treatment for the given question - "f" for functional, "p" for procedural counterpart
Outcome: TRUE if the task was correctly performed, FALSE otherwise
Complexity: cyclomatic complexity of the construct (empty for mrf)
UsageFrequency: usage frequency of the given construct
RQ1Paired-RQ2.csv contains the collected data used for the ordinal logistic regression of the relationship between the perceived ease of understanding of the functional constructs and (i) participants' usage frequency, and (ii) constructs' complexity (except for map/reduce/filter). The file features a row for each participant, and the columns are the following:
Group: experimental group to which the participant is assigned
User: user ID
Time: task time in seconds
Approvals: number of approvals on previous tasks performed on Prolific
Student: whether the participant declared themselves as a student
LambdaF: result for the change task related to a lambda construct
LambdaP: result for the change task related to the procedural counterpart of a lambda construct
CompF: result for the change task related to a comprehension construct
CompP: result for the change task related to the procedural counterpart of a comprehension construct
MrfF: result for the change task related to an MRF construct
MrfP: result for the change task related to the procedural counterpart of a MRF construct
LambdaComp: perceived understandability level for the comparison task (RQ2) between a lambda and its procedural counterpart
CompComp: perceived understandability level for the comparison task (RQ2) between a comprehension and its procedural counterpart
MrfComp: perceived understandability level for the comparison task (RQ2) between a MRF and its procedural counterpart
LambdaCompCplx: cyclomatic complexity of the lambda construct involved in the comparison task (RQ2)
CompCompCplx: cyclomatic complexity of the comprehension construct involved in the comparison task (RQ2)
MrfCompType: type of MRF construct (map, reduce, or filter) used in the comparison task (RQ2)
LambdaUsageFrequency: self-declared usage frequency on lambda constructs
CompUsageFrequency: self-declared usage frequency on comprehension constructs
MrfUsageFrequency: self-declared usage frequency on MRF constructs
LambdaComparisonAssessment: outcome of the manual assessment of the answer to the "check question" required for the lambda comparison ("yes" means valid, "no" means wrong, "moderatechatgpt" and "extremechatgpt" are the results of GPTZero)
CompComparisonAssessment: as above, but for comprehension
MrfComparisonAssessment: as above, but for MRF
working-results/inter-rater-RQ3-files/
This directory contains four .csv files used as input for computing the inter-rater agreement for the manual labeling used for addressing RQ3. Specifically, you will find one file for each functional construct, i.e., comprehension.csv, lambda.csv, and mrf.csv, and a different file used for highlighting the reasons why participants prefer to use the procedural paradigm, i.e., procedural.csv.
working-results/RQ2ManualValidation.csv
This file contains the results of the manual validation being done to sanitize the answers provided by our participants used for addressing RQ2. Specifically, we coded the behaviour description using four different levels: (i) correct ("yes"), (ii) somewhat correct ("partial"), (iii) wrong ("no"), and (iv) automatically generated. The file features a row for each participant, and the columns are the following:
ID: ID we used to refer the participant in the paper's qualitative analysis
Group: experimental group to which the participant is assigned
ProlificID: user ID
Comparison for lambda construct description: answer provided by the user for the lambda comparison task
Final Classification: our assessment of the lambda comparison answer
Comparison for comprehension description: answer provided by the user for the comprehension comparison task
Final Classification: our assessment of the comprehension comparison answer
Comparison for MRF description: answer provided by the user for the MRF comparison task
Final Classification: our assessment of the MRF comparison answer
working-results/RQ3ManualValidation.xlsx
This file contains the results of the open coding applied to address our third research question. Specifically, you will find four sheets, one for each functional construct and one for the procedural paradigm. Each sheet reports the provided answers together with the categories assigned to them. Each sheet contains the following columns:
ID: ID we used to refer the participant in the paper's qualitative
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead of
urban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
The main objective of the HEIS survey is to obtain detailed data on household expenditure and income, linked to various demographic and socio-economic variables, to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. Therefore, to achieve these goals, the sample had to be representative on the sub-district level. The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality.
Data collected through the survey helped in achieving the following objectives: 1. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index 2. Study the consumer expenditure pattern prevailing in the society and the impact of demograohic and socio-economic variables on those patterns 3. Calculate the average annual income of the household and the individual, and assess the relationship between income and different economic and social factors, such as profession and educational level of the head of the household and other indicators 4. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it 5. Provide the necessary data for the national accounts related to overall consumption and income of the household sector 6. Provide the necessary income data to serve in calculating poverty indices and identifying the poor chracteristics as well as drawing poverty maps 7. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty
National
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
The 2008 Household Expenditure and Income Survey sample was designed using two-stage cluster stratified sampling method. In the first stage, the primary sampling units (PSUs), the blocks, were drawn using probability proportionate to the size, through considering the number of households in each block to be the block size. The second stage included drawing the household sample (8 households from each PSU) using the systematic sampling method. Fourth substitute households from each PSU were drawn, using the systematic sampling method, to be used on the first visit to the block in case that any of the main sample households was not visited for any reason.
To estimate the sample size, the coefficient of variation and design effect in each subdistrict were calculated for the expenditure variable from data of the 2006 Household Expenditure and Income Survey. This results was used to estimate the sample size at sub-district level, provided that the coefficient of variation of the expenditure variable at the sub-district level did not exceed 10%, with a minimum number of clusters that should not be less than 6 at the district level, that is to ensure good clusters representation in the administrative areas to enable drawing poverty pockets.
It is worth mentioning that the expected non-response in addition to areas where poor families are concentrated in the major cities were taken into consideration in designing the sample. Therefore, a larger sample size was taken from these areas compared to other ones, in order to help in reaching the poverty pockets and covering them.
Face-to-face [f2f]
List of survey questionnaires: (1) General Form (2) Expenditure on food commodities Form (3) Expenditure on non-food commodities Form
Raw Data The design and implementation of this survey procedures were: 1. Sample design and selection 2. Design of forms/questionnaires, guidelines to assist in filling out the questionnaires, and preparing instruction manuals 3. Design the tables template to be used for the dissemination of the survey results 4. Preparation of the fieldwork phase including printing forms/questionnaires, instruction manuals, data collection instructions, data checking instructions and codebooks 5. Selection and training of survey staff to collect data and run required data checkings 6. Preparation and implementation of the pretest phase for the survey designed to test and develop forms/questionnaires, instructions and software programs required for data processing and production of survey results 7. Data collection 8. Data checking and coding 9. Data entry 10. Data cleaning using data validation programs 11. Data accuracy and consistency checks 12. Data tabulation and preliminary results 13. Preparation of the final report and dissemination of final results
Harmonized Data - The Statistical Package for Social Science (SPSS) was used to clean and harmonize the datasets - The harmonization process started with cleaning all raw data files received from the Statistical Office - Cleaned data files were then all merged to produce one data file on the individual level containing all variables subject to harmonization - A country-specific program was generated for each dataset to generate/compute/recode/rename/format/label harmonized variables - A post-harmonization cleaning process was run on the data - Harmonized data was saved on the household as well as the individual level, in SPSS and converted to STATA format
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The archive contains the following components described below.
Directory "dependencies/":
Directory "plots/4_Equalization/":
Directory "plots/8_User_study/":
Directory "resources/BRIR_auralization/":
Directory "resources/BRIR_rendered/":
Directory "resources/HPCF_KEMAR/":
Directory "resources/User_study/":
Matlab script "x4_Gather_Headphone_Compensations.m":
Matlab script "x6_Gather_SSR_Configurations.m":
Readme file "x6a_Normalize_SSR_Loudnesses.txt":
Matlab script "x6_Gather_SSR_Configurations.m":
Shell script "x7_Start_Study_GUI.sh":
Matlab script "x8_Gather_Study_Data.m":
Matlab script "x8a_Plot_Study_Data.m":
R script "x8b_Analyze_Exp1_Data.R":
R markdown script "x8c_Plot_Exp1_Results.Rmd":
The performance standard for ballistic-resistant body armor published by the National Institute of Justice (NIJ), NIJ Standard 0101.06, recommends estimating the perforation performance of body armor by performing a statistical analysis on V50 ballistic limit testing data. The first objective of this study is to evaluate and compare the estimations of the performance provided by different statistical methods applied to ballistic data generated in the laboratory. Three different distribution models are able to describe the relationship between the projectile velocity and the probability of perforation are considered: the logistic, the probit and the complementary log-log response models. A secondary objective of this study is to apply the different methods to a new body armor model with unusual ballistic limit results, leading one to suspect that it may not be best described by a symmetric model, to determine if this data can be better fitted by a model other than the logistic model. This work has been published as NISTIR 7760, "Analysis of Three Different Regression Models to Estimate the Ballistic Performance of New and Environmentally Conditioned Body Armor." The raw data (ballistic limit data) associated with this prior publication is archived in this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the replication package for the paper titled "How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis", by Alessio Ferrari, Paola Spoletini and Sourav Debnath.
The package contains the following folders and files.
/R-analysis
This is a folder containing all the R implementations of the the statistical tests included in the paper, together with the source .csv file used to produce the results. Each R file has the same title as the associated .csv file. The titles of the files reflect the RQs as they appear in the paper. The association between R files and Tables in the paper is as follows:
- RQ1-1-analyse-story-rates.R: Tabe 1, user story rates
- RQ1-1-analyse-role-rates.R: Table 1, role rates
- RQ1-2-analyse-story-category-phase-1.R: Table 3, user story category rates in phase 1 compared to original rates
- RQ1-2-analyse-role-category-phase-1.R: Table 5, role category rates in phase 1 compared to original rates
- RQ2.1-analysis-app-store-rates-phase-2.R: Table 8, user story and role rates in phase 2
- RQ2.2-analysis-percent-three-CAT-groups-ph1-ph2.R: Table 9, comparison of the categories of user stories in phase 1 and 2
- RQ2.2-analysis-percent-two-CAT-roles-ph1-ph2.R: Table 10, comparison of the categories of roles in phase 1 and 2.
The .csv files used for statistical tests are also used to produce boxplots. The association betwee boxplot figures and files is as follows.
- RQ1-1-story-rates.csv: Figure 4
- RQ1-1-role-rates.csv: Figure 5
- RQ1-2-categories-phase-1.csv: Figure 8
- RQ1-2-role-category-phase-1.csv: Figure 9
- RQ2-1-user-story-and-roles-phase-2.csv: Figure 13
- RQ2.2-percent-three-CAT-groups-ph1-ph2.csv: Figure 14
- RQ2.2-percent-two-CAT-roles-ph1-ph2.csv: Figure 17
- IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv: Figure 15
- IMG-only-RQ2.2-frequent-roles.csv: Figure 18
NOTE: The last two .csv files do not have an associated statistical tests, but are used solely to produce boxplots.
/Data-Analysis
This folder contains all the data used to answer the research questions.
RQ1.xlsx: includes all the data associated to RQ1 subquestions, two tabs for each subquestion (one for user stories and one for roles). The names of the tabs are self-explanatory of their content.
RQ2.1.xlsx: includes all the data for the RQ1.1 subquestion. Specifically, it includes the following tabs:
- Data Source-US-category: for each category of user story, and for each analyst, there are two lines.
The first one reports the number of user stories in that category for phase 1, and the second one reports the
number of user stories in that category for phase 2, considering the specific analyst.
- Data Source-role: for each category of role, and for each analyst, there are two lines.
The first one reports the number of user stories in that role for phase 1, and the second one reports the
number of user stories in that role for phase 2, considering the specific analyst.
- RQ2.1 rates: reports the final rates for RQ2.1.
NOTE: The other tabs are used to support the computation of the final rates.
RQ2.2.xlsx: includes all the data for the RQ2.2 subquestion. Specifically, it includes the following tabs:
- Data Source-US-category: same as RQ2.1.xlsx
- Data Source-role: same as RQ2.1.xlsx
- RQ2.2-category-group: comparison between groups of categories in the different phases, used to produce Figure 14
- RQ2.2-role-group: comparison between role groups in the different phases, used to produce Figure 17
- RQ2.2-specific-roles-diff: difference between specific roles, used to produce Figure 18
NOTE: the other tabs are used to support the computation of the values reported in the tabs above.
RQ2.2-single-US-category.xlsx: includes the data for the RQ2.2 subquestion associated to single categories of user stories.
A separate tab is used given the complexity of the computations.
- Data Source-US-category: same as RQ2.1.xlsx
- Totals: total number of user stories for each analyst in phase 1 and phase 2
- Results-Rate-Comparison: difference between rates of user stories in phase 1 and phase 2, used to produce the file
"img/IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv", which is in turn used to produce Figure 15
- Results-Analysts: number of analysts using each novel category produced in phase 2, used to produce Figure 16.
NOTE: the other tabs are used to support the computation of the values reported in the tabs above.
RQ2.3.xlsx: includes the data for the RQ2.3 subquestion. Specifically, it includes the following tabs:
- Data Source-US-category: same as RQ2.1.xlsx
- Data Source-role: same as RQ2.1.xlsx
- RQ2.3-categories: novel categories produced in phase 2, used to produce Figure 19
- RQ2-3-most-frequent-categories: most frequent novel categories
/Raw-Data-Phase-I
The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx, plus the file of the original user stories with annotations (original-us.xlsx). Each file contains two tabs:
- Evaluation: includes the annotation of the user stories as existing user story in the original categories (annotated with "E"), novel user story in a certain category (refinement, annotated with "N"), and novel user story in novel category (Name of the category in column "New Feature"). **NOTE 1:** It should be noticed that in the paper the case "refinement" is said to be annotated with "R" (instead of "N", as in the files) to make the paper clearer and easy to read.
- Roles: roles used in the user stories, and count of the user stories belonging to a certain role.
/Raw-Data-Phaes-II
The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx. Each file contains two tabs:
- Analysis: includes the annotation of the user stories as belonging to existing original
category (X), or to categories introduced after interviews, or to categories introduced
after app store inspired elicitation (name of category in "Cat. Created in PH1"), or to
entirely novel categories (name of category in "New Category").
- Roles: roles used in the user stories, and count of the user stories belonging to a certain role.
/Figures
This folder includes the figures reported in the paper. The boxplots are generated from the
data using the tool http://shiny.chemgrid.org/boxplotr/. The histograms and other plots are
produced with Excel, and are also reported in the excel files listed above.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description of Source Code and Raw Data
Overview: The provided source code and raw data files are designed for evaluating the performance of a proposed algorithm using 15 widely recognized benchmark functions. These functions are critical for assessing the algorithm's efficiency, robustness, and effectiveness in optimization tasks. The evaluation is conducted across three different dimensions: 10, 20, and 30, providing a comprehensive analysis of the algorithm's capability to handle varying complexities.
Components:
1. Source Code:
The source code is implemented to execute the proposed algorithm on the benchmark functions. It includes modules for initializing populations, applying genetic operations (selection, crossover, mutation), and measuring performance metrics such as fitness value, convergence rate, and computational time.
The code is adaptable for different dimensional settings (10, 20, 30 dimensions) and can be easily modified to adjust parameters such as population size, iteration count, and genetic operators' specifics.
The algorithm is tested against a suite of 15 benchmark functions, each representing a unique challenge in the optimization landscape, including unimodal, multimodal, separable, and non-separable functions.
2. Raw Data:
The raw data consists of the results generated by running the proposed algorithm on each benchmark function across the three dimensional settings (10, 20, and 30 dimensions).
Data includes multiple runs to ensure statistical significance, capturing metrics like the best and average fitness values, standard deviation, and convergence behavior over the iterations.
This data is crucial for performing comparative analysis, highlighting the strengths and weaknesses of the proposed algorithm relative to existing methods.
Benchmark Functions:
· The 15 benchmark functions include a mix of well-known test cases such as Sphere, Rosenbrock, Rastrigin, Ackley, and others. Each function is crafted to test different aspects of optimization algorithms, from dealing with high-dimensional search spaces to escaping local optima.
· The functions are provided in the standard mathematical form, and the source code includes implementation details for each.
Purpose:
· The primary goal of this package is to validate the effectiveness of the proposed algorithm against standard benchmarks in the field. The source code enables reproducibility of results, while the raw data serves as a baseline for further research and comparison with other optimization techniques.
Usage:
· Researchers can use the provided source code to replicate the experiments or adapt the algorithm for other benchmark functions or dimensional settings.
· The raw data can be analyzed using statistical tools to derive insights into the algorithm's performance across different scenarios.
This data publication contains all material used in Hötte, K., 2019, "Skill transferability and the adoption of new technology: A learning based explanation for patterns of diffusion".
It is composed of (1) the simulation model are required inputs to reproduce the results, (2) simulated data presented in the article, (3) R-scripts that were used for the statistical analyses, (4) selected results and graphics that are partly used in the article and partly supplementary.
Please check for software updates (concerning the model and R code) on gitlab. If you are only interested in the programming code, I recommend to check out gitlab first because this data publication consumes a lot of disk-space due to the large amount of simulated data (~16Gb).
If you have questions, do not hesitate to send me an email: kerstin.hoette[at]uni-bielefeld.de
Technological capabilities are decisive to make effectively use of new machinery and capital goods. Firms and employees accumulate these capabilities when working with specific machinery. Radical innovation differs by technology type and pre-existing capabilities may be imperfectly transferable across types. In this paper, I address the implications of cross-technology transferability of capabilities for firm-level technology adoption and macroeconomic directed technological change. I propose a microeconomically founded model of technological learning that is based on empirical and theoretical insights of the innovation literature. In a simulation study using the ABM Eurace@unibi-eco and applied to the context of green technology diffusion, it is shown that a high transferability of knowledge has ambiguous effects. It accelerates the diffusion process initially, but comes of the cost of technological stability and specialization. For firms, it is easy to adopt, but also easy to switch back to the conventional technology type. It is shown how different types of policies can be used to stabilize the diffusion process. The framework of analysis is used to derive a general characterization of technologies that may provide guidance for future empirical analyses.
See also readme files in the subfolders.
The data provided should allow you to REPRODUCE the simulations, i.e. to produce your own simulation data that should exhibit the same patterns as those discussed in the paper.
my_library_functions.c: Running order of vintages adjusted by using costs.
its: initial population
This data allows you to perform STATISTICAL ANALYSES with the simulation output yourself. You may use these as input to the Rcode.
Experiment folders contain simulation files and simulation output - baseline -> With intermediate techn. difficulty and distance - difficulty -> 3 discrete levels of chi^{dist} - distance -> 3 discrete levels of chi^{int} monte_carlo_exp: - both_learning_at_random_3_barr -> Monte Carlo analysis (MC) with fix barrier and randomly drawn learning parameters - both_learning_at_random_random_barr -> MC with random learning and random barrier at max 10 pct, serves as policy baseline - rand_learn_rand_pol_rand_barr10 -> Policy experiment
In principle, you should be able to reproduce the simulated data (Note that the model has stochastic components, hence it will not be EXACTLY the same but sufficiently similar).
This documentation makes the STATISTICAL METHODS used in the paper transparent. Sorry for the inefficient code.
Output of analysed data and additional time series plots. Here, you find additional time series that are not presented in the main article, some descriptive statistics and the ouput of statistical tests and analyses, i.e. regression output and wilcoxon test results in txt format and plots that are used in the paper. These files can be reproduced by the R code.
The author gratefully acknowledges the achievements and provision of free statistical software maintained by the R programming community. This work uses a modified version of the Eurace@Unibi mo
The basic goal of this survey is to provide the necessary database for formulating national policies at various levels. It represents the contribution of the household sector to the Gross National Product (GNP). Household Surveys help as well in determining the incidence of poverty, and providing weighted data which reflects the relative importance of the consumption items to be employed in determining the benchmark for rates and prices of items and services. Generally, the Household Expenditure and Consumption Survey is a fundamental cornerstone in the process of studying the nutritional status in the Palestinian territory.
The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality. Data is a public good, in the interest of the region, and it is consistent with the Economic Research Forum's mandate to make micro data available, aiding regional research on this important topic.
The survey data covers urban, rural and camp areas in West Bank and Gaza Strip.
1- Household/families. 2- Individuals.
The survey covered all the Palestinian households who are a usual residence in the Palestinian Territory.
Sample survey data [ssd]
The sampling frame consists of all enumeration areas which were enumerated in 1997; the enumeration area consists of buildings and housing units and is composed of an average of 120 households. The enumeration areas were used as Primary Sampling Units (PSUs) in the first stage of the sampling selection. The enumeration areas of the master sample were updated in 2003.
The sample is a stratified cluster systematic random sample with two stages: First stage: selection of a systematic random sample of 299 enumeration areas. Second stage: selection of a systematic random sample of 12-18 households from each enumeration area selected in the first stage. A person (18 years and more) was selected from each household in the second stage.
The population was divided by: 1- Governorate 2- Type of Locality (urban, rural, refugee camps)
The calculated sample size is 3,781 households.
The target cluster size or "sample-take" is the average number of households to be selected per PSU. In this survey, the sample take is around 12 households.
Detailed information/formulas on the sampling design are available in the user manual.
Face-to-face [f2f]
The PECS questionnaire consists of two main sections:
First section: Certain articles / provisions of the form filled at the beginning of the month,and the remainder filled out at the end of the month. The questionnaire includes the following provisions:
Cover sheet: It contains detailed and particulars of the family, date of visit, particular of the field/office work team, number/sex of the family members.
Statement of the family members: Contains social, economic and demographic particulars of the selected family.
Statement of the long-lasting commodities and income generation activities: Includes a number of basic and indispensable items (i.e, Livestock, or agricultural lands).
Housing Characteristics: Includes information and data pertaining to the housing conditions, including type of shelter, number of rooms, ownership, rent, water, electricity supply, connection to the sewer system, source of cooking and heating fuel, and remoteness/proximity of the house to education and health facilities.
Monthly and Annual Income: Data pertaining to the income of the family is collected from different sources at the end of the registration / recording period.
Second section: The second section of the questionnaire includes a list of 54 consumption and expenditure groups itemized and serially numbered according to its importance to the family. Each of these groups contains important commodities. The number of commodities items in each for all groups stood at 667 commodities and services items. Groups 1-21 include food, drink, and cigarettes. Group 22 includes homemade commodities. Groups 23-45 include all items except for food, drink and cigarettes. Groups 50-54 include all of the long-lasting commodities. Data on each of these groups was collected over different intervals of time so as to reflect expenditure over a period of one full year.
Both data entry and tabulation were performed using the ACCESS and SPSS software programs. The data entry process was organized in 6 files, corresponding to the main parts of the questionnaire. A data entry template was designed to reflect an exact image of the questionnaire, and included various electronic checks: logical check, range checks, consistency checks and cross-validation. Complete manual inspection was made of results after data entry was performed, and questionnaires containing field-related errors were sent back to the field for corrections.
The survey sample consists of about 3,781 households interviewed over a twelve-month period between January 2004 and January 2005. There were 3,098 households that completed the interview, of which 2,060 were in the West Bank and 1,038 households were in GazaStrip. The response rate was 82% in the Palestinian Territory.
The calculations of standard errors for the main survey estimations enable the user to identify the accuracy of estimations and the survey reliability. Total errors of the survey can be divided into two kinds: statistical errors, and non-statistical errors. Non-statistical errors are related to the procedures of statistical work at different stages, such as the failure to explain questions in the questionnaire, unwillingness or inability to provide correct responses, bad statistical coverage, etc. These errors depend on the nature of the work, training, supervision, and conducting all various related activities. The work team spared no effort at different stages to minimize non-statistical errors; however, it is difficult to estimate numerically such errors due to absence of technical computation methods based on theoretical principles to tackle them. On the other hand, statistical errors can be measured. Frequently they are measured by the standard error, which is the positive square root of the variance. The variance of this survey has been computed by using the “programming package” CENVAR.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description. This is the data used in the experiment of the following conference paper:
N. Arınık, R. Figueiredo, and V. Labatut, “Signed Graph Analysis for the Interpretation of Voting Behavior,” in International Conference on Knowledge Technologies and Data-driven Business - International Workshop on Social Network Analysis and Digital Humanities, Graz, AT, 2017, vol. 2025. ⟨hal-01583133⟩
Source code. The code source is accessible on GitHub: https://github.com/CompNet/NetVotes
Citation. If you use the data or source code, please cite the above paper.
@InProceedings{Arinik2017, author = {Arınık, Nejat and Figueiredo, Rosa and Labatut, Vincent}, title = {Signed Graph Analysis for the Interpretation of Voting Behavior}, booktitle = {International Conference on Knowledge Technologies and Data-driven Business - International Workshop on Social Network Analysis and Digital Humanities}, year = {2017}, volume = {2025}, series = {CEUR Workshop Proceedings}, address = {Graz, AT}, url = {http://ceur-ws.org/Vol-2025/paper_rssna_1.pdf},}
Details.
----------------------# COMPARISON RESULTSThe 'material-stats' folder contains all the comparison results obtained for Ex-CC and ILS-CC. The csv files associated with plots are also provided.The folder structure is as follows:* material-stats/** execTimePerf: The plot shows the execution time of Ex-CC and ILS-CC based on randomly generated complete networks of different size.** graphStructureAnalysis: The plots show the weights and links statistics for all instances.** ILS-CC-vs-Ex-CC: The folder contains 4 different comparisons between Ex-CC and ILS-CC: Imbalance difference, number of detected clusters, difference of the number of detected clusters, NMI (Normalized Mutual Information)
----------------------Funding: Agorantic FR 3621, FMJH Program Gaspard Monge in optimization and operation research (Project 2015-2842H)
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn't, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication. Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%-35% in 2007-2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available. First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available. These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let's learn from those with high rates of sharing to embrace the full potential of our research output.
The Associated Press is sharing data from the COVID Impact Survey, which provides statistics about physical health, mental health, economic security and social dynamics related to the coronavirus pandemic in the United States.
Conducted by NORC at the University of Chicago for the Data Foundation, the probability-based survey provides estimates for the United States as a whole, as well as in 10 states (California, Colorado, Florida, Louisiana, Minnesota, Missouri, Montana, New York, Oregon and Texas) and eight metropolitan areas (Atlanta, Baltimore, Birmingham, Chicago, Cleveland, Columbus, Phoenix and Pittsburgh).
The survey is designed to allow for an ongoing gauge of public perception, health and economic status to see what is shifting during the pandemic. When multiple sets of data are available, it will allow for the tracking of how issues ranging from COVID-19 symptoms to economic status change over time.
The survey is focused on three core areas of research:
Instead, use our queries linked below or statistical software such as R or SPSS to weight the data.
If you'd like to create a table to see how people nationally or in your state or city feel about a topic in the survey, use the survey questionnaire and codebook to match a question (the variable label) to a variable name. For instance, "How often have you felt lonely in the past 7 days?" is variable "soc5c".
Nationally: Go to this query and enter soc5c as the variable. Hit the blue Run Query button in the upper right hand corner.
Local or State: To find figures for that response in a specific state, go to this query and type in a state name and soc5c as the variable, and then hit the blue Run Query button in the upper right hand corner.
The resulting sentence you could write out of these queries is: "People in some states are less likely to report loneliness than others. For example, 66% of Louisianans report feeling lonely on none of the last seven days, compared with 52% of Californians. Nationally, 60% of people said they hadn't felt lonely."
The margin of error for the national and regional surveys is found in the attached methods statement. You will need the margin of error to determine if the comparisons are statistically significant. If the difference is:
The survey data will be provided under embargo in both comma-delimited and statistical formats.
Each set of survey data will be numbered and have the date the embargo lifts in front of it in the format of: 01_April_30_covid_impact_survey. The survey has been organized by the Data Foundation, a non-profit non-partisan think tank, and is sponsored by the Federal Reserve Bank of Minneapolis and the Packard Foundation. It is conducted by NORC at the University of Chicago, a non-partisan research organization. (NORC is not an abbreviation, it part of the organization's formal name.)
Data for the national estimates are collected using the AmeriSpeak Panel, NORC’s probability-based panel designed to be representative of the U.S. household population. Interviews are conducted with adults age 18 and over representing the 50 states and the District of Columbia. Panel members are randomly drawn from AmeriSpeak with a target of achieving 2,000 interviews in each survey. Invited panel members may complete the survey online or by telephone with an NORC telephone interviewer.
Once all the study data have been made final, an iterative raking process is used to adjust for any survey nonresponse as well as any noncoverage or under and oversampling resulting from the study specific sample design. Raking variables include age, gender, census division, race/ethnicity, education, and county groupings based on county level counts of the number of COVID-19 deaths. Demographic weighting variables were obtained from the 2020 Current Population Survey. The count of COVID-19 deaths by county was obtained from USA Facts. The weighted data reflect the U.S. population of adults age 18 and over.
Data for the regional estimates are collected using a multi-mode address-based (ABS) approach that allows residents of each area to complete the interview via web or with an NORC telephone interviewer. All sampled households are mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Interviews are conducted with adults age 18 and over with a target of achieving 400 interviews in each region in each survey.Additional details on the survey methodology and the survey questionnaire are attached below or can be found at https://www.covid-impact.org.
Results should be credited to the COVID Impact Survey, conducted by NORC at the University of Chicago for the Data Foundation.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Verbal and Quantitative Reasoning GRE scores and percentiles were collected by querying the student database for the appropriate information. Any student records that were missing data such as GRE scores or grade point average were removed from the study before the data were analyzed. The GRE Scores of entering doctoral students from 2007-2012 were collected and analyzed. A total of 528 student records were reviewed. Ninety-six records were removed from the data because of a lack of GRE scores. Thirty-nine of these records belonged to MD/PhD applicants who were not required to take the GRE to be reviewed for admission. Fifty-seven more records were removed because they did not have an admissions committee score in the database. After 2011, the GRE’s scoring system was changed from a scale of 200-800 points per section to 130-170 points per section. As a result, 12 more records were removed because their scores were representative of the new scoring system and therefore were not able to be compared to the older scores based on raw score. After removal of these 96 records from our analyses, a total of 420 student records remained which included students that were currently enrolled, left the doctoral program without a degree, or left the doctoral program with an MS degree. To maintain consistency in the participants, we removed 100 additional records so that our analyses only considered students that had graduated with a doctoral degree. In addition, thirty-nine admissions scores were identified as outliers by statistical analysis software and removed for a final data set of 286 (see Outliers below). Outliers We used the automated ROUT method included in the PRISM software to test the data for the presence of outliers which could skew our data. The false discovery rate for outlier detection (Q) was set to 1%. After removing the 96 students without a GRE score, 432 students were reviewed for the presence of outliers. ROUT detected 39 outliers that were removed before statistical analysis was performed. Sample See detailed description in the Participants section. Linear regression analysis was used to examine potential trends between GRE scores, GRE percentiles, normalized admissions scores or GPA and outcomes between selected student groups. The D’Agostino & Pearson omnibus and Shapiro-Wilk normality tests were used to test for normality regarding outcomes in the sample. The Pearson correlation coefficient was calculated to determine the relationship between GRE scores, GRE percentiles, admissions scores or GPA (undergraduate and graduate) and time to degree. Candidacy exam results were divided into students who either passed or failed the exam. A Mann-Whitney test was then used to test for statistically significant differences between mean GRE scores, percentiles, and undergraduate GPA and candidacy exam results. Other variables were also observed such as gender, race, ethnicity, and citizenship status within the samples. Predictive Metrics. The input variables used in this study were GPA and scores and percentiles of applicants on both the Quantitative and Verbal Reasoning GRE sections. GRE scores and percentiles were examined to normalize variances that could occur between tests. Performance Metrics. The output variables used in the statistical analyses of each data set were either the amount of time it took for each student to earn their doctoral degree, or the student’s candidacy examination result.
This dataset contains all data and code required to clean the data, fit the models, and create the figures and tables for the laboratory experiment portion of the manuscript:Kannan, N., Q. D. Read, and W. Zhang. 2024. A natural polymer material as a pesticide adjuvant for mitigating off-target drift and protecting pollinator health. Heliyon, in press. https://doi.org/10.1016/j.heliyon.2024.e35510.In this dataset, we archive results from several laboratory and field trials testing different adjuvants (spray additives) that are intended to reduce particle drift, increase particle size, and slow down the particles from pesticide spray nozzles. We fit statistical models to the droplet size and speed distribution data and statistically compare different metrics between the adjuvants (sodium alginate, polyacrylamide [PAM], and control without any adjuvants). The following files are included:RawDataPAMsodAlgOxfLsr.xlsx: Raw data for primary analysesOrganizedDataPaperRevision20240614.xlsx: Raw data to produce density plots presented in Figs. 8 and 9raw_data_readme.md: Markdown file with description of the raw data filesR_code_supplement.R: All R code required to reproduce primary analysesR_code_supplement2.R: R code required to produce density plots presented in Figs. 8 and 9Intermediate R output files are also included so that tables and figures can be recreated without having to rerun the data preprocessing, model fitting, and posterior estimation steps:pam_cleaned.RData: Data combined into clean R data frames for analysisvelocityscaledlogdiamfit.rds: Fitted brms model object for velocitylnormfitreduced.rds: Fitted brms model object for diameter distributionemm_con_velo_diam_draws.RData: Posterior distributions of estimated marginal means for velocityemm_con_draws.RData: Posterior distributions of estimated marginal means for diameter distributionThe following software and package versions were used:R version 4.3.1CmdStan version 2.33.1R packages:brms version 2.20.5cmdstanr version 0.5.3fitdistrplus version 1.1-11tidybayes version 3.0.4emmeans version 1.8.9
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
dataset and Octave/MatLab codes/scripts for data analysis Background: Methods for p-value correction are criticized for either increasing Type II error or improperly reducing Type I error. This problem is worse when dealing with thousands or even hundreds of paired comparisons between waves or images which are performed point-to-point. This text considers patterns in probability vectors resulting from multiple point-to-point comparisons between two event-related potentials (ERP) waves (mass univariate analysis) to correct p-values, where clusters of signiticant p-values may indicate true H0 rejection. New method: We used ERP data from normal subjects and other ones with attention deficit hyperactivity disorder (ADHD) under a cued forced two-choice test to study attention. The decimal logarithm of the p-vector (p') was convolved with a Gaussian window whose length was set as the shortest lag above which autocorrelation of each ERP wave may be assumed to have vanished. To verify the reliability of the present correction method, we realized Monte-Carlo simulations (MC) to (1) evaluate confidence intervals of rejected and non-rejected areas of our data, (2) to evaluate differences between corrected and uncorrected p-vectors or simulated ones in terms of distribution of significant p-values, and (3) to empirically verify rate of type-I error (comparing 10,000 pairs of mixed samples whit control and ADHD subjects). Results: the present method reduced the range of p'-values that did not show covariance with neighbors (type I and also type-II errors). The differences between simulation or raw p-vector and corrected p-vectors were, respectively, minimal and maximal for window length set by autocorrelation in p-vector convolution. Comparison with existing methods: Our method was less conservative while FDR methods rejected basically all significant p-values for Pz and O2 channels. The MC simulations, gold-standard method for error correction, presented 2.78±4.83% of difference (all 20 channels) from p-vector after correction, while difference between raw and corrected p-vector was 5,96±5.00% (p = 0.0003). Conclusion: As a cluster-based correction, the present new method seems to be biological and statistically suitable to correct p-values in mass univariate analysis of ERP waves, which adopts adaptive parameters to set correction.
This dataset includes all raw data and statistical software code required to reproduce the analyses and graphics in the manuscript:McKenzie-Reynolds, P., I. A. Owolabi, G. R. Burke, A. Levi, A. M. Simmons, and Q. D. Read. 2025. Wild sources for host plant resistance to Bemisia tabaci in watermelon: insights from behavioral and chemical analyses. Crop Protection, in review. Citation pending. (ARIS log 426869)Whitefly infestations, primarily caused by Bemisia tabaci, pose a significant threat to watermelon production, leading to severe yield losses and increased reliance on chemical pesticides. We did a study to evaluate the potential of the desert watermelon Citrullus colocynthis and other Citrullus species genotypes for resistance to B. tabaci using oviposition assays, vertical Y-tube olfactometer assays, and gas chromatography-mass spectrometry (GC-MS) analysis of plant volatiles. This dataset contains all the raw and processed data and statistical software code to reproduce the analyses and graphics in the associated manuscript. Our statistical analysis includes Bayesian generalized linear mixed models fit to the oviposition and Y-tube olfactometer datasets, with posterior distributions of the model parameters used to estimate means for each genotype and test hypotheses comparing them. In this dataset we have included CSV files of the raw data, R statistical software code contained in RMarkdown notebooks, HTML rendered output of the notebooks including all figures, tables, and textual description of the results, and pre-fit model objects so that the notebooks may be rendered without refitting the models. The findings in the accompanying manuscript provide critical insights into resistance mechanisms in C. colocynthis and advance sustainable watermelon production, reducing chemical pesticide dependence and enhancing economic returns for growers.A full description of all files included in the dataset is found in README.pdf.
Check out the more recent versions of the rcode here: https://gitlab.ub.uni-bielefeld.de/khoette/rcode_eurace
The data publication contains all resources (data, code and statistical output) required to reproduce the results presented in "How to accelerate green technology diffusion? An agent-based approach to directed technological change with coevolving absorptive capacity" (Hötte 2019). Objective of this publication is the transparency, reproducibility and reusability of the simulation model and statistical analysis.
The publication is composed of four directories: (1) The directory "model" allows the reader to understand the implementation of the simulation model (C-code), to reproduce the simulated data and to use the model for further studies. A conceptual description and technical documentation of the model is provided in the paper mentioned above. (2) The directory "experiment_directories_and_data" contains the simulated data that is presented and discussed in the paper. This data allows to reproduce exactly the statistical analyses presented in the paper and to check the general validity of the model.
(3) The directory "rcode" contains the code that was used for the statistical analyses and makes the methods transparent for the reader. (4) The directory "results" contains the output files of the statistical analyses, e.g. plots and txt-output files documenting the regression analyses.
Each directory contains a readme file with additional information about the content and instructions how to use it.
The data provided should allow you to REPRODUCE the simulations, i.e. to produce your own simulation data that should exhibit the same patterns as those discussed in the paper. Before trying to run the model, I strongly recommend to check out the introductory and explanatory material provided by the developers of the original model: http://www.wiwi.uni-bielefeld.de/lehrbereiche/vwl/etace/Eurace_Unibi/
my_library_functions.c: Running order of vintages adjusted by using costs.
its: initial population
This model is a modified version of the Eurace@Unibi model, developed by Herbert Dawid, Simon Gemkow, Philipp Harting, Sander van der Hoog and Michael Neugart, as an extension of the research within the EU 6 th Framework Project Eurace.
This data allows you to perform STATISTICAL ANALYSES with the simulation output yourself. You may use these as input to the Rcode.
Experiment folders contain simulation files and simulation output:
- baseline
- rand_barr
- rand_pol34_fix_barr5
- rand_pol34_rand_barr15
In principle, you should be able to reproduce the simulated data with the code provided in "model" (Note that the model has stochastic components, hence it will not be EXACTLY the same but sufficiently similar).
This documentation makes the STATISTICAL METHODS used in the paper transparent. Sorry for the inefficient code. Check whether updates are available.
Output of analysed data. Here you find the output of the statistical analyses, i.e. regression output and wilcoxon test results in txt format and plots that are used in the paper. These files can be reproduced by the R code.
The results are documented experimentwise, i.e. baseline, barrier strength, policy with fix barriers, policy with random barriers.
Particular gratitude is owed to Cord Wiljes for extensive support accompanying this data publication.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.