100+ datasets found

Statistical Analysis of Individual Participant Data Meta-Analyses: A...
plos.figshare.com
datasetcatalog.nlm.nih.gov
tiff
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart (2023). Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice [Dataset]. http://doi.org/10.1371/journal.pone.0046042
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0046042
Dataset updated
Jun 8, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.
Z
Data from: Replication package for the paper: "A Study on the Pythonic...
data.niaid.nih.gov
zenodo.org
Updated Jan 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antoniol, Giuliano (2024). Replication package for the paper: "A Study on the Pythonic Functional Constructs' Understandability" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8191782
Explore at:
Dataset updated
Jan 23, 2024
Dataset provided by
Zampetti, Fiorella
Zid, Cyrine
Di Penta, Massimiliano
Antoniol, Giuliano
License
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
Description
Replication Package for "A Study on the Pythonic Functional Constructs' Understandability" to appear at ICSE 2024

Authors: Cyrine Zid, Fiorella Zampetti, Giuliano Antoniol, Massimiliano Di penta

Article Preprint: https://mdipenta.github.io/files/ICSE24_funcExperiment.pdf

Artifacts: https://doi.org/10.5281/zenodo.8191782

License: GPL V3.0

This package contains folders and files with code and data used in the study described in the paper. In the following, we first provide all fields required for the submission, and then report a detailed description of all repository folders.

Artifact Description

Purpose

The artifact is about a controlled experiment aimed at investigating the extent to which Pythonic functional constructs have an impact on source code understandability. The artifact archive contains:

The material to allow replicating the study (see Section Experimental-Material)

Raw quantitative results, working datasets, and scripts to replicate the statistical analyses reported in the paper. Specifically, the executable part of the replication package reproduces figures and tables of the quantitative analysis (RQ1 and RQ2) of the paper starting from the working datasets.

Spreadsheets used for the qualitative analysis (RQ3).

We apply for the following badges:

Available and reusable: because we provide all the material that can be used to replicate the experiment, but also to perform the statistical analyses and the qualitative analyses (spreadsheets, in this case)

Provenance

Paper preprint link: https://mdipenta.github.io/files/ICSE24_funcExperiment.pdf

Artifacts: https://doi.org/10.5281/zenodo.8191782

Data

Results have been obtained by conducting the controlled experiment involving Prolificworkers as participants. Data collection and processing followed a protocol approved by the University ethical board. Note that all data enclosed in the artifact is completely anonymized and does not contain sensible information.

Further details about the provided dataset can be found in the Section Results' directory and files

Setup and Usage (for executable artifacts):

See the Section Scripts to reproduce the results, and instructions for running them

Experiment-Material/

Contains the material used for the experiment, and, specifically, the following subdirectories:

Google-Forms/

Contains (as PDF documents) the questionnaires submitted to the ten experimental groups.

Task-Sources/

Contains, for each experimental group (G-1...G-10), the sources used to produce the Google Forms, and, specifically: - The cover letter (Letter.docx). - A directory for each experimental task (Lambda 1, Lambda 2, Comp 1, Comp 2, MRF 1, MRF 2, Lambda Comparison, Comp Comparison, MRF Comparison). Each directory contains: (i) the exercise text (in both Word and .txt format), the source code snippet, and its .png image to be used in the form. Note: the "Comparison" tasks do not have any exercise as the purpose is always the same, i.e., to compare the (perceived) understandability of the snippets and return the results of the comparison.

Code-Examples-Table1/

Contains the source code snippets used as objects of the study (the same you can find under "Task-Sources/"), named as reported in Table 1.

Results' directory and files

raw-responses/

Contains, as spreadsheets, the raw responses provided by the study participants through Google forms.

raw-results-RQ1/

Contains the raw results for RQ1. Specifically, the directory contains a subdirectory for each group (G1-G10). Each subdirectory contains: - For each user (named using their Prolific IDs, a directory containing, for each question (Q1-Q6) the produced python code (Qn.py) its output (QnR.txt) and its StdErr output (QnErr.txt). - "expected-outputs/": A directory containing the expected outputs for each task (Qn.txt).

working-results/RQ1-RQ2-files-for-statistical-analysis/

Contains three .csv files used as input for conducting the statistical analysis and drawing the graphs for addressing the first two research questions of the study. Specifically:

ConstructUsage.csv contains the declared frequency usage of the three functional constructs object of the study. This file is used to draw Figure 4. The file contains an entry for each participant, reporting the (text-coded) frequency of construct usage for Comprehension, Lambda, and MRF.

RQ1.csv contains the collected data used for the mixed-effect logistic regression relating the use of functional constructs with the correctness of the change task, as well as the logistic regression relating the use of map/reduce/filter functions with the correctness of the change task. The csv file contains an entry for each answer provided by each subject, and features the following columns:

Group: experimental group to which the participant is assigned

User: user ID

Time: task time in seconds

Approvals: number of approvals on previous tasks performed on Prolific

Student: whether the participant declared themselves as a student

Section: section of the questionnaire (lambda, comp, or mrf)

Construct: specific construct being presented (same as "Section" for lambda and comp, for mrf it says whether it is a map, reduce, or filter)

Question: question id, from Q1 to Q6, indicate the ordering of the question

MainFactor: main factor treatment for the given question - "f" for functional, "p" for procedural counterpart

Outcome: TRUE if the task was correctly performed, FALSE otherwise

Complexity: cyclomatic complexity of the construct (empty for mrf)

UsageFrequency: usage frequency of the given construct

RQ1Paired-RQ2.csv contains the collected data used for the ordinal logistic regression of the relationship between the perceived ease of understanding of the functional constructs and (i) participants' usage frequency, and (ii) constructs' complexity (except for map/reduce/filter). The file features a row for each participant, and the columns are the following:

Group: experimental group to which the participant is assigned

User: user ID

Time: task time in seconds

Approvals: number of approvals on previous tasks performed on Prolific

Student: whether the participant declared themselves as a student

LambdaF: result for the change task related to a lambda construct

LambdaP: result for the change task related to the procedural counterpart of a lambda construct

CompF: result for the change task related to a comprehension construct

CompP: result for the change task related to the procedural counterpart of a comprehension construct

MrfF: result for the change task related to an MRF construct

MrfP: result for the change task related to the procedural counterpart of a MRF construct

LambdaComp: perceived understandability level for the comparison task (RQ2) between a lambda and its procedural counterpart

CompComp: perceived understandability level for the comparison task (RQ2) between a comprehension and its procedural counterpart

MrfComp: perceived understandability level for the comparison task (RQ2) between a MRF and its procedural counterpart

LambdaCompCplx: cyclomatic complexity of the lambda construct involved in the comparison task (RQ2)

CompCompCplx: cyclomatic complexity of the comprehension construct involved in the comparison task (RQ2)

MrfCompType: type of MRF construct (map, reduce, or filter) used in the comparison task (RQ2)

LambdaUsageFrequency: self-declared usage frequency on lambda constructs

CompUsageFrequency: self-declared usage frequency on comprehension constructs

MrfUsageFrequency: self-declared usage frequency on MRF constructs

LambdaComparisonAssessment: outcome of the manual assessment of the answer to the "check question" required for the lambda comparison ("yes" means valid, "no" means wrong, "moderatechatgpt" and "extremechatgpt" are the results of GPTZero)

CompComparisonAssessment: as above, but for comprehension

MrfComparisonAssessment: as above, but for MRF

working-results/inter-rater-RQ3-files/

This directory contains four .csv files used as input for computing the inter-rater agreement for the manual labeling used for addressing RQ3. Specifically, you will find one file for each functional construct, i.e., comprehension.csv, lambda.csv, and mrf.csv, and a different file used for highlighting the reasons why participants prefer to use the procedural paradigm, i.e., procedural.csv.

working-results/RQ2ManualValidation.csv

This file contains the results of the manual validation being done to sanitize the answers provided by our participants used for addressing RQ2. Specifically, we coded the behaviour description using four different levels: (i) correct ("yes"), (ii) somewhat correct ("partial"), (iii) wrong ("no"), and (iv) automatically generated. The file features a row for each participant, and the columns are the following:

ID: ID we used to refer the participant in the paper's qualitative analysis

Group: experimental group to which the participant is assigned

ProlificID: user ID

Comparison for lambda construct description: answer provided by the user for the lambda comparison task

Final Classification: our assessment of the lambda comparison answer

Comparison for comprehension description: answer provided by the user for the comprehension comparison task

Final Classification: our assessment of the comprehension comparison answer

Comparison for MRF description: answer provided by the user for the MRF comparison task

Final Classification: our assessment of the MRF comparison answer

working-results/RQ3ManualValidation.xlsx

This file contains the results of the open coding applied to address our third research question. Specifically, you will find four sheets, one for each functional construct and one for the procedural paradigm. Each sheet reports the provided answers together with the categories assigned to them. Each sheet contains the following columns:

ID: ID we used to refer the participant in the paper's qualitative
Supplementary material from "Visual comparison of two data sets: Do people...
figshare.com
xlsx
Updated Mar 14, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robin Kramer; Caitlin Telfer; Alice Towler (2017). Supplementary material from "Visual comparison of two data sets: Do people use the means and the variability?" [Dataset]. http://doi.org/10.6084/m9.figshare.4751095.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4751095.v1
Dataset updated
Mar 14, 2017
Dataset provided by
Figsharehttp://figshare.com/
Authors
Robin Kramer; Caitlin Telfer; Alice Towler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.
f
UC_vs_US Statistic Analysis.xlsx
figshare.com
xlsx
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.12631628.v1
Dataset updated
Jul 9, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme: Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
i
Household Expenditure and Income Survey 2008, Economic Research Forum (ERF)...
catalog.ihsn.org
Updated Jan 12, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Statistics (2022). Household Expenditure and Income Survey 2008, Economic Research Forum (ERF) Harmonization Data - Jordan [Dataset]. https://catalog.ihsn.org/index.php/catalog/7661
Explore at:
Dataset updated
Jan 12, 2022
Dataset authored and provided by
Department of Statistics
Time period covered
2008 - 2009
Area covered
Jordan
Description
Abstract

The main objective of the HEIS survey is to obtain detailed data on household expenditure and income, linked to various demographic and socio-economic variables, to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. Therefore, to achieve these goals, the sample had to be representative on the sub-district level. The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality.

Data collected through the survey helped in achieving the following objectives: 1. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index 2. Study the consumer expenditure pattern prevailing in the society and the impact of demograohic and socio-economic variables on those patterns 3. Calculate the average annual income of the household and the individual, and assess the relationship between income and different economic and social factors, such as profession and educational level of the head of the household and other indicators 4. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it 5. Provide the necessary data for the national accounts related to overall consumption and income of the household sector 6. Provide the necessary income data to serve in calculating poverty indices and identifying the poor chracteristics as well as drawing poverty maps 7. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty

Geographic coverage

National

Analysis unit

Household/families

Individuals

Universe

The survey covered a national sample of households and all individuals permanently residing in surveyed households.

Kind of data

Sample survey data [ssd]

Sampling procedure

The 2008 Household Expenditure and Income Survey sample was designed using two-stage cluster stratified sampling method. In the first stage, the primary sampling units (PSUs), the blocks, were drawn using probability proportionate to the size, through considering the number of households in each block to be the block size. The second stage included drawing the household sample (8 households from each PSU) using the systematic sampling method. Fourth substitute households from each PSU were drawn, using the systematic sampling method, to be used on the first visit to the block in case that any of the main sample households was not visited for any reason.

To estimate the sample size, the coefficient of variation and design effect in each subdistrict were calculated for the expenditure variable from data of the 2006 Household Expenditure and Income Survey. This results was used to estimate the sample size at sub-district level, provided that the coefficient of variation of the expenditure variable at the sub-district level did not exceed 10%, with a minimum number of clusters that should not be less than 6 at the district level, that is to ensure good clusters representation in the administrative areas to enable drawing poverty pockets.

It is worth mentioning that the expected non-response in addition to areas where poor families are concentrated in the major cities were taken into consideration in designing the sample. Therefore, a larger sample size was taken from these areas compared to other ones, in order to help in reaching the poverty pockets and covering them.

Mode of data collection

Face-to-face [f2f]

Research instrument

List of survey questionnaires: (1) General Form (2) Expenditure on food commodities Form (3) Expenditure on non-food commodities Form

Cleaning operations

Raw Data The design and implementation of this survey procedures were: 1. Sample design and selection 2. Design of forms/questionnaires, guidelines to assist in filling out the questionnaires, and preparing instruction manuals 3. Design the tables template to be used for the dissemination of the survey results 4. Preparation of the fieldwork phase including printing forms/questionnaires, instruction manuals, data collection instructions, data checking instructions and codebooks 5. Selection and training of survey staff to collect data and run required data checkings 6. Preparation and implementation of the pretest phase for the survey designed to test and develop forms/questionnaires, instructions and software programs required for data processing and production of survey results 7. Data collection 8. Data checking and coding 9. Data entry 10. Data cleaning using data validation programs 11. Data accuracy and consistency checks 12. Data tabulation and preliminary results 13. Preparation of the final report and dissemination of final results

Harmonized Data - The Statistical Package for Social Science (SPSS) was used to clean and harmonize the datasets - The harmonization process started with cleaning all raw data files received from the Statistical Office - Cleaned data files were then all merged to produce one data file on the individual level containing all variables subject to harmonization - A country-specific program was generated for each dataset to generate/compute/recode/rename/format/label harmonized variables - A post-harmonization cleaning process was run on the data - Harmonized data was saved on the household as well as the individual level, in SPSS and converted to STATA format
Experiment resources for "Quality of Binaural Rendering From Baffled...
zenodo.org
zip
Updated May 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hannes Helmholz; Hannes Helmholz (2024). Experiment resources for "Quality of Binaural Rendering From Baffled Microphone Arrays Evaluated Without an Explicit Reference" [Dataset]. http://doi.org/10.5281/zenodo.10901444
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10901444
Dataset updated
May 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hannes Helmholz; Hannes Helmholz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set contains the following resources to reproduce the listening experiment and statistical analysis of the referenced manuscript:

The binaural room impulse responses (BRIRs) of all listening conditions presented in the perceptual experiment.

The tools to create the listening test infrastructure, including Pure Data (Pd) patches and configuration files for the SoundScape Renderer (SSR) and graphical user interface (GUI).

The raw response data as gathered from the experiment subjects.

The R and Stan scripts to perform the statistical analysis and generate the resulting plots and tables.

The archive contains the following components described below.

Directory "dependencies/":

Matlab, R, Stan, and Pd functions that are utilized in the code and experimental setup

Additional dependencies of available open-source projects may be required for certain code functions. If so, the source and setup process for the necessary dependencies are documented in the file header.

Directory "plots/4_Equalization/":

Plots of all available headphone equalization filters as generated by the following Matlab scripts.

Directory "plots/8_User_study/":

Plots of the raw and analyzed experimental results as generated by the following Matlab and R scripts.

Directory "resources/BRIR_auralization/":

Audio files with static binaural auralizations of all listening conditions presented in the perceptual experiment as generated by the following Matlab scripts.

The files in "Kemar_HRTF_sofa_N44_adjusted" do not include a headphone equalization.

The files in "Kemar_HRTF_sofa_N44_adjusted+Sennheiser_HD650_lin" include the equalization for the Sennheiser HD650 headphones employed in the listening experiment. The files are identical to the annotated listening examples published for the manuscript.

Directory "resources/BRIR_rendered/":

BRIRs and rendering parameters of all listening conditions presented in the perceptual experiment as generated from the associated data set and rendering code.

Scene configuration files for the SSR with all listening conditions presented in the perceptual experiment as generated from the following Matlab scripts.

Directory "resources/HPCF_KEMAR/":

Impulse responses of equalization filters for various headphones on the G.R.A.S KEMAR acoustic dummy head as measured for this experiment.

Directory "resources/User_study/":

Various resources for the listening experiment.

The files in "Exp1_analysis" include intermediate and final statistical analysis results as generated from the following R scripts.

"Exp1_config.json" contains the configuration of the study GUI with conditions presented in the listening experiment.

"Exp1_Introduction.pdf" contains the instructions presented to the subjects at the start of the listening experiment.

"Exp1_Part2_data_strings.xls" contains all subjects' raw perceptual response data gathered from the listening experiment.

"Questionnaire.pdf" contains the questionnaire given to the subjects at the end of the listening experiment.

Matlab script "x4_Gather_Headphone_Compensations.m":

Generate plots of measured headphone compensation filters. Furthermore, the generated minimum phase filters and filters to yield a linear phase response from the headphones are extracted as separate WAV files.

Matlab script "x6_Gather_SSR_Configurations.m":

Collect several specified pre-rendered binaural room impulse response sets into an ASD file. The SSR can load this scene to present all gathered configurations in direct comparison with head tracking.

Readme file "x6a_Normalize_SSR_Loudnesses.txt":

Ideally, the rendering script would implement a measure to provide a reliable estimation of the binaural loudness of the rendered configuration. This could be used to normalize all stimuli levels. However, such a measure is currently not available or implemented.

Therefore, tuning the stimuli loudness for the user study was performed beforehand by ear. The adjusted playback levels are set in a modified SSR configuration file for the listening experiment.

Matlab script "x6_Gather_SSR_Configurations.m":

Perform convolution of (rendered) binaural room impulse responses with a source audio signal. This is done for a specified selection of static head orientations and a continuous rotation over all horizontal head orientations.

The resulting auralizations are published as supplementary materials to the manuscript.

Shell script "x7_Start_Study_GUI.sh":

Initialize all required components to perform the perceptual user study, including:

SSR to perform the real-time rendering of the BRIRs with head tracking

SSR to extract head-tracking data (in case a Polhemus tracker is used)

Pd to extract head-tracking data (in case a Supperware tracker is used)

Pd to perform real-time convolution to apply headphone compensation

Pd to receive OSC messages from the study GUI

Pd to trigger audio file playback from received OSC messages

Pd to translate OSC messages into FUDI messages for the SSR

The GUI to be used by the participants and implement the study procedure

Some static configuration variables can be adjusted, whereas other parameters are chosen during script execution.

Matlab script "x8_Gather_Study_Data.m":

Transform the raw result data from the questionnaire (*.xls) and the study GUI (*.json) into a compact format (*.xls) that can be loaded to plot the raw data and imported by software for the subsequent statistical analysis.

Note that this contains the responses from all subjects, whereas responses from the investigators must be excluded from the statistical analysis (which is implemented in the analysis scripts).

Matlab script "x8a_Plot_Study_Data.m":

Generate a set of violin plots to visualize the initial distribution of the raw perceptual data. The data is split by specified attributes and plotted separately for visual inspection.

The data may also be transformed into ranks for a first distribution inspection. Note that implementing the ranking method, notably how ties are resolved, may differ from the technique employed in the statistical analysis.

Note that this contains the responses from all subjects, whereas responses from the investigators must be excluded from the statistical analysis (which is implemented in the analysis scripts).

R script "x8b_Analyze_Exp1_Data.R":

Perform the statistical analysis by transforming the observed subject ratings into a predicted distribution of ranks using a hierarchical generalized linear regression model.

Executing the statistical model may take some time due to the Bayesian framework employing Markov-chain Monte Carlo simulations.

Data is exported at various intermediate steps to be loaded and visualized by the following R script.

R markdown script "x8c_Plot_Exp1_Results.Rmd":

Generate various plots and data tables of the observed data and the predicted results to visualize the distribution and influence of different analysis parameters.

Some of the resulting plots were used in the manuscript.

"x8c_Plot_Exp1_Results.html" conveniently summarizes all plots and data tables generated by "knitting" the R markdown script.
Ballistic test results for several different soft body armor systems
catalog.data.gov
data.nist.gov
+1more
Updated Mar 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2024). Ballistic test results for several different soft body armor systems [Dataset]. https://catalog.data.gov/dataset/ballistic-test-results-for-several-different-soft-body-armor-systems-a17ac
Explore at:
Dataset updated
Mar 12, 2024
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The performance standard for ballistic-resistant body armor published by the National Institute of Justice (NIJ), NIJ Standard 0101.06, recommends estimating the perforation performance of body armor by performing a statistical analysis on V50 ballistic limit testing data. The first objective of this study is to evaluate and compare the estimations of the performance provided by different statistical methods applied to ballistic data generated in the laboratory. Three different distribution models are able to describe the relationship between the projectile velocity and the probability of perforation are considered: the logistic, the probit and the complementary log-log response models. A secondary objective of this study is to apply the different methods to a new body armor model with unusual ballistic limit results, leading one to suspect that it may not be best described by a symmetric model, to determine if this data can be better fitted by a model other than the logistic model. This work has been published as NISTIR 7760, "Analysis of Three Different Regression Models to Estimate the Ballistic Performance of New and Environmentally Conditioned Body Armor." The raw data (ballistic limit data) associated with this prior publication is archived in this dataset.
o
Data from: Direct and trans-generational effects of tetracyclines on the...
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Jun 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexis Kriete; Maxwell Scott (2024). Data from: Direct and trans-generational effects of tetracyclines on the microbiome, transcriptome, and male mating behavior of the sheep blowfly Lucilia cuprina [Dataset]. http://doi.org/10.5061/dryad.76hdr7t4h
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.76hdr7t4h
Dataset updated
Jun 22, 2024
Authors
Alexis Kriete; Maxwell Scott
Description
Data from: Direct and trans-generational effects of tetracyclines on the microbiome, transcriptome, and male mating behavior of the sheep blowfly Lucilia cuprina https://doi.org/10.5061/dryad.76hdr7t4h ## Description of the data and file structure This dataset contains additional files associated with the manuscript "Direct and trans-generational effects of tetracyclines on the microbiome, transcriptome, and male mating behavior of the sheep blowfly Lucilia cuprina.” Version history 2/20/2025: ● Added the following new files: ○ S1 Fig.pdf ○ S2 Fig.pdf ○ S3 Fig.pdf ○ S1 Table.docx ○ S2 Table.docx ○ S3 Table.docx ○ S4 Table.docx ○ S1 Data.xlsx ○ S3 Data.xlsx ● Combined previous files (“ANCOMBC2_comps_pairwise_allgroups.csv”, “sample-metadata.csv”, “PD_pvals_pairs.csv”, “Evenness_pvals_pairs.csv”, and “Shannon_pvals_pairs.csv”) into one file, “S2 Data.xlsx”. ● Combined previous files (“qPCR input data for R.csv” and “qPCR raw data and stats.xlsx”) into one file, “S4 Data.xlsx”. Description of the data and file structure S1 Fig.pdf. This PDF file displays a figure visualizing Principal Coordinates of Analysis (PCoA) of Robust Aitchison distance of the L. cuprina microbiome samples. Percent values on axes 1 and 2 indicate the proportion of total variation in the distance matrix explained by each axis. S2 Fig.pdf. This PDF file displays a figure visualizing Principal components analysis (PCA) of the *L. cuprina *transcriptome samples grouped by sex and developmental stage. Percent values on axes 1 and 2 indicate the proportion of total variation in the distance matrix explained by each axis. S3 Fig.pdf. This PDF file displays a figure visualizing the experimental design of the study. S1 Table.docx. This .docx file displays a table showing the results of PERMANOVA analysis of beta diversity for the L. cuprina microbiome samples. S2 Table.docx. This .docx file displays a table showing how many bacterial taxa were missing or differentially abundant in L. cuprina microbiome samples when comparing experimental to control groups. S3 Table.docx. This .docx file displays a table showing the results of a study testing different dosages of anhydrotetracycline (ATC) used to maintain a transgenic L. cuprina line with a tetracycline-repressible female-killing gene (DH1). S4 Table.docx. This .docx file displays a table showing the primers used in this study. Primer sequences, target sites, and (for qPCR primers) amplification efficiencies and R2 values are indicated. ### Files and variables #### File: S1_Data.xlsx Description: This .xlsx file contains 2 tabs consisting of the raw data and statistical analyses from male sexual competitiveness assays. ##### Variables * Raw Data: This tab contains the raw data from male sexual competitiveness assays with wild-type, antibiotic-treated G3 males and their untreated G4 offspring vs. untreated EF3A males. This tab also contains a table showing the baseline sexual competitiveness of untreated wild-type males vs. untreated EF3A males. * Statistics: This tab contains the output of statistical tests run on the raw data, including the Shapiro-Wilk test for normality, Bartlett’s test for equal variance, a Chi-square test of independence, a one-way ANOVA for detecting differences between group means and pairwise t-tests between individual groups with Benjamini-Hochberg correction for multiple comparisons. #### File: S4_Data.xlsx Description: This .xlsx file contains 10 tabs consisting of raw qPCR data, metadata, and associated statistical analyses. ##### Variables * Raw data and calculations: This tab contains the raw qPCR measurements (i.e., Cq values) for the GST-1, 16S, and COXI genes for each of the 54 samples used in this study. The calculations used to produce the delta-delta Cq values and associated fold changes reported in the manuscript are also shown in this tab. * R input for stats: This tab contains the matrix of sample data (including metadata) and associated delta-delta Cq values for importation into R. * p-values 16S adult comps: This tab displays the p-values (adjusted for multiple comparisons using the Benjamini-Hochberg correction) from pairwise comparisons of delta-delta Cq values for the 16S gene across adult sample groups using an unpaired t-test. * p-values 16S larva comps: This tab displays the p-values (adjusted for multiple comparisons using the Benjamini-Hochberg correction) from pairwise comparisons of delta-delta Cq values for the 16S gene across larva sample groups using an unpaired t-test. * **p-values COXI adult comps: **This tab displays the p-values (adjusted for multiple comparisons using the Benjamini-Hochberg correction) from pairwise comparisons of delta-delta Cq values for the COXI gene across adult sample groups using an unpaired t-test. * **p-values COXI larva comps: **This tab displays the p-values (adjusted for multiple com...
Replication Package - How Do Requirements Evolve During Elicitation? An...
zenodo.org
bin, zip
Updated Apr 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessio Ferrari; Alessio Ferrari; Paola Spoletini; Paola Spoletini; Sourav Debnath; Sourav Debnath (2022). Replication Package - How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis [Dataset]. http://doi.org/10.5281/zenodo.6472498
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6472498
Dataset updated
Apr 21, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alessio Ferrari; Alessio Ferrari; Paola Spoletini; Paola Spoletini; Sourav Debnath; Sourav Debnath
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the replication package for the paper titled "How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis", by Alessio Ferrari, Paola Spoletini and Sourav Debnath.

The package contains the following folders and files.

/R-analysis

This is a folder containing all the R implementations of the the statistical tests included in the paper, together with the source .csv file used to produce the results. Each R file has the same title as the associated .csv file. The titles of the files reflect the RQs as they appear in the paper. The association between R files and Tables in the paper is as follows:

- RQ1-1-analyse-story-rates.R: Tabe 1, user story rates

- RQ1-1-analyse-role-rates.R: Table 1, role rates

- RQ1-2-analyse-story-category-phase-1.R: Table 3, user story category rates in phase 1 compared to original rates

- RQ1-2-analyse-role-category-phase-1.R: Table 5, role category rates in phase 1 compared to original rates

- RQ2.1-analysis-app-store-rates-phase-2.R: Table 8, user story and role rates in phase 2

- RQ2.2-analysis-percent-three-CAT-groups-ph1-ph2.R: Table 9, comparison of the categories of user stories in phase 1 and 2

- RQ2.2-analysis-percent-two-CAT-roles-ph1-ph2.R: Table 10, comparison of the categories of roles in phase 1 and 2.

The .csv files used for statistical tests are also used to produce boxplots. The association betwee boxplot figures and files is as follows.

- RQ1-1-story-rates.csv: Figure 4

- RQ1-1-role-rates.csv: Figure 5

- RQ1-2-categories-phase-1.csv: Figure 8

- RQ1-2-role-category-phase-1.csv: Figure 9

- RQ2-1-user-story-and-roles-phase-2.csv: Figure 13

- RQ2.2-percent-three-CAT-groups-ph1-ph2.csv: Figure 14

- RQ2.2-percent-two-CAT-roles-ph1-ph2.csv: Figure 17

- IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv: Figure 15

- IMG-only-RQ2.2-frequent-roles.csv: Figure 18

NOTE: The last two .csv files do not have an associated statistical tests, but are used solely to produce boxplots.

/Data-Analysis

This folder contains all the data used to answer the research questions.

RQ1.xlsx: includes all the data associated to RQ1 subquestions, two tabs for each subquestion (one for user stories and one for roles). The names of the tabs are self-explanatory of their content.

RQ2.1.xlsx: includes all the data for the RQ1.1 subquestion. Specifically, it includes the following tabs:

- Data Source-US-category: for each category of user story, and for each analyst, there are two lines.

The first one reports the number of user stories in that category for phase 1, and the second one reports the

number of user stories in that category for phase 2, considering the specific analyst.

- Data Source-role: for each category of role, and for each analyst, there are two lines.

The first one reports the number of user stories in that role for phase 1, and the second one reports the

number of user stories in that role for phase 2, considering the specific analyst.

- RQ2.1 rates: reports the final rates for RQ2.1.

NOTE: The other tabs are used to support the computation of the final rates.

RQ2.2.xlsx: includes all the data for the RQ2.2 subquestion. Specifically, it includes the following tabs:

- Data Source-US-category: same as RQ2.1.xlsx

- Data Source-role: same as RQ2.1.xlsx

- RQ2.2-category-group: comparison between groups of categories in the different phases, used to produce Figure 14

- RQ2.2-role-group: comparison between role groups in the different phases, used to produce Figure 17

- RQ2.2-specific-roles-diff: difference between specific roles, used to produce Figure 18

NOTE: the other tabs are used to support the computation of the values reported in the tabs above.

RQ2.2-single-US-category.xlsx: includes the data for the RQ2.2 subquestion associated to single categories of user stories.

A separate tab is used given the complexity of the computations.

- Data Source-US-category: same as RQ2.1.xlsx

- Totals: total number of user stories for each analyst in phase 1 and phase 2

- Results-Rate-Comparison: difference between rates of user stories in phase 1 and phase 2, used to produce the file

"img/IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv", which is in turn used to produce Figure 15

- Results-Analysts: number of analysts using each novel category produced in phase 2, used to produce Figure 16.

NOTE: the other tabs are used to support the computation of the values reported in the tabs above.

RQ2.3.xlsx: includes the data for the RQ2.3 subquestion. Specifically, it includes the following tabs:

- Data Source-US-category: same as RQ2.1.xlsx

- Data Source-role: same as RQ2.1.xlsx

- RQ2.3-categories: novel categories produced in phase 2, used to produce Figure 19

- RQ2-3-most-frequent-categories: most frequent novel categories

/Raw-Data-Phase-I

The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx, plus the file of the original user stories with annotations (original-us.xlsx). Each file contains two tabs:

- Evaluation: includes the annotation of the user stories as existing user story in the original categories (annotated with "E"), novel user story in a certain category (refinement, annotated with "N"), and novel user story in novel category (Name of the category in column "New Feature"). **NOTE 1:** It should be noticed that in the paper the case "refinement" is said to be annotated with "R" (instead of "N", as in the files) to make the paper clearer and easy to read.

- Roles: roles used in the user stories, and count of the user stories belonging to a certain role.

/Raw-Data-Phaes-II

The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx. Each file contains two tabs:

- Analysis: includes the annotation of the user stories as belonging to existing original

category (X), or to categories introduced after interviews, or to categories introduced

after app store inspired elicitation (name of category in "Cat. Created in PH1"), or to

entirely novel categories (name of category in "New Category").

- Roles: roles used in the user stories, and count of the user stories belonging to a certain role.

/Figures

This folder includes the figures reported in the paper. The boxplots are generated from the

data using the tool http://shiny.chemgrid.org/boxplotr/. The histograms and other plots are

produced with Excel, and are also reported in the excel files listed above.
Pyhon code and raw data for our proposed algorithm(GANMA)
zenodo.org
bin
Updated Aug 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajashree Mishra; Rajashree Mishra; Neha Majhi; Neha Majhi (2024). Pyhon code and raw data for our proposed algorithm(GANMA) [Dataset]. http://doi.org/10.5281/zenodo.13375454
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13375454
Dataset updated
Aug 26, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rajashree Mishra; Rajashree Mishra; Neha Majhi; Neha Majhi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description of Source Code and Raw Data

Overview: The provided source code and raw data files are designed for evaluating the performance of a proposed algorithm using 15 widely recognized benchmark functions. These functions are critical for assessing the algorithm's efficiency, robustness, and effectiveness in optimization tasks. The evaluation is conducted across three different dimensions: 10, 20, and 30, providing a comprehensive analysis of the algorithm's capability to handle varying complexities.

Components:

1. Source Code:

The source code is implemented to execute the proposed algorithm on the benchmark functions. It includes modules for initializing populations, applying genetic operations (selection, crossover, mutation), and measuring performance metrics such as fitness value, convergence rate, and computational time.

The code is adaptable for different dimensional settings (10, 20, 30 dimensions) and can be easily modified to adjust parameters such as population size, iteration count, and genetic operators' specifics.

The algorithm is tested against a suite of 15 benchmark functions, each representing a unique challenge in the optimization landscape, including unimodal, multimodal, separable, and non-separable functions.

2. Raw Data:

The raw data consists of the results generated by running the proposed algorithm on each benchmark function across the three dimensional settings (10, 20, and 30 dimensions).

Data includes multiple runs to ensure statistical significance, capturing metrics like the best and average fitness values, standard deviation, and convergence behavior over the iterations.

This data is crucial for performing comparative analysis, highlighting the strengths and weaknesses of the proposed algorithm relative to existing methods.

Benchmark Functions:

· The 15 benchmark functions include a mix of well-known test cases such as Sphere, Rosenbrock, Rastrigin, Ackley, and others. Each function is crafted to test different aspects of optimization algorithms, from dealing with high-dimensional search spaces to escaping local optima.

· The functions are provided in the standard mathematical form, and the source code includes implementation details for each.

Purpose:

· The primary goal of this package is to validate the effectiveness of the proposed algorithm against standard benchmarks in the field. The source code enables reproducibility of results, while the raw data serves as a baseline for further research and comparison with other optimization techniques.

Usage:

· Researchers can use the provided source code to replicate the experiments or adapt the algorithm for other benchmark functions or dimensional settings.

· The raw data can be analyzed using statistical tools to derive insights into the algorithm's performance across different scenarios.
u
Data from: Data publication for: Skill transferability and the adoption of...
pub.uni-bielefeld.de
Updated Apr 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kerstin Hötte (2024). Data publication for: Skill transferability and the adoption of new technology [Dataset]. https://pub.uni-bielefeld.de/record/2935425
Explore at:
Dataset updated
Apr 29, 2024
Authors
Kerstin Hötte
Description
This data publication contains all material used in Hötte, K., 2019, "Skill transferability and the adoption of new technology: A learning based explanation for patterns of diffusion". It is composed of (1) the simulation model are required inputs to reproduce the results, (2) simulated data presented in the article, (3) R-scripts that were used for the statistical analyses, (4) selected results and graphics that are partly used in the article and partly supplementary.

Please check for software updates (concerning the model and R code) on gitlab. If you are only interested in the programming code, I recommend to check out gitlab first because this data publication consumes a lot of disk-space due to the large amount of simulated data (~16Gb).

If you have questions, do not hesitate to send me an email: kerstin.hoette[at]uni-bielefeld.de

Background: Abstract of the underlying research paper

Technological capabilities are decisive to make effectively use of new machinery and capital goods. Firms and employees accumulate these capabilities when working with specific machinery. Radical innovation differs by technology type and pre-existing capabilities may be imperfectly transferable across types. In this paper, I address the implications of cross-technology transferability of capabilities for firm-level technology adoption and macroeconomic directed technological change. I propose a microeconomically founded model of technological learning that is based on empirical and theoretical insights of the innovation literature. In a simulation study using the ABM Eurace@unibi-eco and applied to the context of green technology diffusion, it is shown that a high transferability of knowledge has ambiguous effects. It accelerates the diffusion process initially, but comes of the cost of technological stability and specialization. For firms, it is easy to adopt, but also easy to switch back to the conventional technology type. It is shown how different types of policies can be used to stabilize the diffusion process. The framework of analysis is used to derive a general characterization of technologies that may provide guidance for future empirical analyses.

More detailed overview of the content:

See also readme files in the subfolders.

model: Simulation inputs

The data provided should allow you to REPRODUCE the simulations, i.e. to produce your own simulation data that should exhibit the same patterns as those discussed in the paper.

model files: model xml files and c code Extensions compared to "standard eurace@unibi": (Here only main modifications highlighted)

Cons_Goods_UNIBI: Production routine and investment decision modified.

Government_GREQAM: Eco policy module added.

Inv_Goods_Vintage: Adaptive pricing mechanism and endogenous innovation added/ modified.

Labour_UNIBI: Households skill endowment and learning.

Statistical_Office_UNIBI: Documentation of indicator variables.

my_library_functions.c: Running order of vintages adjusted by using costs.

its: initial population

experiment_directories_and_data: Simulated data (raw).

This data allows you to perform STATISTICAL ANALYSES with the simulation output yourself. You may use these as input to the Rcode.

Experiment folders contain simulation files and simulation output - baseline -> With intermediate techn. difficulty and distance - difficulty -> 3 discrete levels of chi^{dist} - distance -> 3 discrete levels of chi^{int} monte_carlo_exp: - both_learning_at_random_3_barr -> Monte Carlo analysis (MC) with fix barrier and randomly drawn learning parameters - both_learning_at_random_random_barr -> MC with random learning and random barrier at max 10 pct, serves as policy baseline - rand_learn_rand_pol_rand_barr10 -> Policy experiment

In principle, you should be able to reproduce the simulated data (Note that the model has stochastic components, hence it will not be EXACTLY the same but sufficiently similar).

rcode:

This documentation makes the STATISTICAL METHODS used in the paper transparent. Sorry for the inefficient code.

rcode: R scripts used for statistical analysis of simulation output

selected_results

Output of analysed data and additional time series plots. Here, you find additional time series that are not presented in the main article, some descriptive statistics and the ouput of statistical tests and analyses, i.e. regression output and wilcoxon test results in txt format and plots that are used in the paper. These files can be reproduced by the R code.

Acknowledgements

The author gratefully acknowledges the achievements and provision of free statistical software maintained by the R programming community. This work uses a modified version of the Eurace@Unibi mo
Expenditure and Consumption Survey, 2004 - West Bank and Gaza
dev.ihsn.org
catalog.ihsn.org
+1more
Updated Apr 25, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Palestinian Central Bureau of Statistics (2019). Expenditure and Consumption Survey, 2004 - West Bank and Gaza [Dataset]. https://dev.ihsn.org/nada/catalog/73908
Explore at:
Dataset updated
Apr 25, 2019
Dataset authored and provided by
Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/
Time period covered
2004 - 2005
Area covered
Gaza, Gaza Strip, West Bank
Description
Abstract

The basic goal of this survey is to provide the necessary database for formulating national policies at various levels. It represents the contribution of the household sector to the Gross National Product (GNP). Household Surveys help as well in determining the incidence of poverty, and providing weighted data which reflects the relative importance of the consumption items to be employed in determining the benchmark for rates and prices of items and services. Generally, the Household Expenditure and Consumption Survey is a fundamental cornerstone in the process of studying the nutritional status in the Palestinian territory.

The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality. Data is a public good, in the interest of the region, and it is consistent with the Economic Research Forum's mandate to make micro data available, aiding regional research on this important topic.

Geographic coverage

The survey data covers urban, rural and camp areas in West Bank and Gaza Strip.

Analysis unit

1- Household/families. 2- Individuals.

Universe

The survey covered all the Palestinian households who are a usual residence in the Palestinian Territory.

Kind of data

Sample survey data [ssd]

Sampling procedure

Sample and Frame:

The sampling frame consists of all enumeration areas which were enumerated in 1997; the enumeration area consists of buildings and housing units and is composed of an average of 120 households. The enumeration areas were used as Primary Sampling Units (PSUs) in the first stage of the sampling selection. The enumeration areas of the master sample were updated in 2003.

Sample Design:

The sample is a stratified cluster systematic random sample with two stages: First stage: selection of a systematic random sample of 299 enumeration areas. Second stage: selection of a systematic random sample of 12-18 households from each enumeration area selected in the first stage. A person (18 years and more) was selected from each household in the second stage.

Sample strata:

The population was divided by: 1- Governorate 2- Type of Locality (urban, rural, refugee camps)

Sample Size:

The calculated sample size is 3,781 households.

Target cluster size:

The target cluster size or "sample-take" is the average number of households to be selected per PSU. In this survey, the sample take is around 12 households.

Detailed information/formulas on the sampling design are available in the user manual.

Mode of data collection

Face-to-face [f2f]

Research instrument

The PECS questionnaire consists of two main sections:

First section: Certain articles / provisions of the form filled at the beginning of the month,and the remainder filled out at the end of the month. The questionnaire includes the following provisions:

Cover sheet: It contains detailed and particulars of the family, date of visit, particular of the field/office work team, number/sex of the family members.

Statement of the family members: Contains social, economic and demographic particulars of the selected family.

Statement of the long-lasting commodities and income generation activities: Includes a number of basic and indispensable items (i.e, Livestock, or agricultural lands).

Housing Characteristics: Includes information and data pertaining to the housing conditions, including type of shelter, number of rooms, ownership, rent, water, electricity supply, connection to the sewer system, source of cooking and heating fuel, and remoteness/proximity of the house to education and health facilities.

Monthly and Annual Income: Data pertaining to the income of the family is collected from different sources at the end of the registration / recording period.

Second section: The second section of the questionnaire includes a list of 54 consumption and expenditure groups itemized and serially numbered according to its importance to the family. Each of these groups contains important commodities. The number of commodities items in each for all groups stood at 667 commodities and services items. Groups 1-21 include food, drink, and cigarettes. Group 22 includes homemade commodities. Groups 23-45 include all items except for food, drink and cigarettes. Groups 50-54 include all of the long-lasting commodities. Data on each of these groups was collected over different intervals of time so as to reflect expenditure over a period of one full year.

Cleaning operations

Raw Data

Both data entry and tabulation were performed using the ACCESS and SPSS software programs. The data entry process was organized in 6 files, corresponding to the main parts of the questionnaire. A data entry template was designed to reflect an exact image of the questionnaire, and included various electronic checks: logical check, range checks, consistency checks and cross-validation. Complete manual inspection was made of results after data entry was performed, and questionnaires containing field-related errors were sent back to the field for corrections.

Harmonized Data

The Statistical Package for Social Science (SPSS) is used to clean and harmonize the datasets.

The harmonization process starts with cleaning all raw data files received from the Statistical Office.

Cleaned data files are then all merged to produce one data file on the individual level containing all variables subject to harmonization.

A country-specific program is generated for each dataset to generate/compute/recode/rename/format/label harmonized variables.

A post-harmonization cleaning process is run on the data.

Harmonized data is saved on the household as well as the individual level, in SPSS and converted to STATA format.

Response rate

The survey sample consists of about 3,781 households interviewed over a twelve-month period between January 2004 and January 2005. There were 3,098 households that completed the interview, of which 2,060 were in the West Bank and 1,038 households were in GazaStrip. The response rate was 82% in the Palestinian Territory.

Sampling error estimates

The calculations of standard errors for the main survey estimations enable the user to identify the accuracy of estimations and the survey reliability. Total errors of the survey can be divided into two kinds: statistical errors, and non-statistical errors. Non-statistical errors are related to the procedures of statistical work at different stages, such as the failure to explain questions in the questionnaire, unwillingness or inability to provide correct responses, bad statistical coverage, etc. These errors depend on the nature of the work, training, supervision, and conducting all various related activities. The work team spared no effort at different stages to minimize non-statistical errors; however, it is difficult to estimate numerically such errors due to absence of technical computation methods based on theoretical principles to tackle them. On the other hand, statistical errors can be measured. Frequently they are measured by the standard error, which is the positive square root of the variance. The variance of this survey has been computed by using the “programming package” CENVAR.
Z
NetVotes iKnow Dataset
data.niaid.nih.gov
zenodo.org
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Figueiredo, Rosa (2024). NetVotes iKnow Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6816075
Explore at:
Dataset updated
Oct 1, 2024
Dataset provided by
Figueiredo, Rosa
Labatut, Vincent
Arınık, Nejat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description. This is the data used in the experiment of the following conference paper:

N. Arınık, R. Figueiredo, and V. Labatut, “Signed Graph Analysis for the Interpretation of Voting Behavior,” in International Conference on Knowledge Technologies and Data-driven Business - International Workshop on Social Network Analysis and Digital Humanities, Graz, AT, 2017, vol. 2025. ⟨hal-01583133⟩

Source code. The code source is accessible on GitHub: https://github.com/CompNet/NetVotes

Citation. If you use the data or source code, please cite the above paper.

@InProceedings{Arinik2017, author = {Arınık, Nejat and Figueiredo, Rosa and Labatut, Vincent}, title = {Signed Graph Analysis for the Interpretation of Voting Behavior}, booktitle = {International Conference on Knowledge Technologies and Data-driven Business - International Workshop on Social Network Analysis and Digital Humanities}, year = {2017}, volume = {2025}, series = {CEUR Workshop Proceedings}, address = {Graz, AT}, url = {http://ceur-ws.org/Vol-2025/paper_rssna_1.pdf},}

Details.

RAW INPUT FILESThe 'itsyourparliament' folder contains all raw input files for further data processing (such as network extraction).The folder structure is as follows:* itsyourparliament/** domains: There are 28 domain files. Each file corresponds to a domain (such as Agriculture, Economy, etc.) and contains corresponding vote identifiers and their "itsyourparliament.eu" links.** meps: There are 870 Member of Parliament (MEP) files. Each file contains the MEP information (such as name, country, address, etc.)** votes: There are 7513 vote files. Each file contains the votes expressed by MEPs# NETWORKS AND CORRESPONDING PARTITIONSThis work studies the voting behavior of French and Italian MEPs on "Agriculture and Rural Development" (AGRI) and "Economic and Monetary Affairs" (ECON) for each separate year of the 7th EP term (2009-10, 2010-11, 2011-12, 2012-13, 2013-14). Note that the interpretation part (section 4) of the published paper is limited to only a few of these instances (2009-10 in ECON and 2012-13 in AGRI).The extracted networks are located in the "networks" folder and the corresponding partitions are in the "partitions" folder. Both folders have the same structure, which is as follows:COUNTRY-NAME|_DOMAIN-NAME|_2009-10|_2010-11|_2011-12|_2012-13|_2013-14## NETWORKSThe networks in this folder are used in the article. All those networks are the ones obtained after the filtering step (as explained in the article). The networks are in 'Graphml' format. These networks are enriched with some MEPs' properties (such as name, political party, etc.) associated with each node.## ALL NETWORKSFor those who are interested in other countries or domains, we make available all possible networks that we can extract from raw data with vs. without filtering step.COUNTRY-NAME|_m3|_negtr=NA_postr=NA: This folder contains all filtered networks. Note that the filtering step is explained in Section 2.1.2 of the article.|_bygroup|_bycountry|_negtr=0_postr=0: This folder contains all original networks (i.e. no filtering step).|_bygroup|_bycountry## PARTITIONSThe partitions are obtained in this way: First, the Ex-CC (exact) method is run and we denote 'k' for the the number of detected cluster in output. This 'k' value is the reference point in order to run the ILS-RCC (heuristic) method by specifying the number of desired cluster in output. Then, ILS-RCC is run with various values ('k', 'k+1', 'k+2'). All those results are integrated into the initial network graphml files and then converted into gephi format so that this will help dive in the results in interactive way.Note that we need to handle the absent MEPs in clustering results. Because, those MEPs correspond to isolated nodes in networks. Each isolated node is considered a single cluster node in Ex-CC results. We simply omit those nodes in order to find the 'k' (number of detected cluster) value before running ILS-RCC. Not also that ILS-RCC does not process isolated nodes such that an isolated node can be part of a cluster.

----------------------# COMPARISON RESULTSThe 'material-stats' folder contains all the comparison results obtained for Ex-CC and ILS-CC. The csv files associated with plots are also provided.The folder structure is as follows:* material-stats/** execTimePerf: The plot shows the execution time of Ex-CC and ILS-CC based on randomly generated complete networks of different size.** graphStructureAnalysis: The plots show the weights and links statistics for all instances.** ILS-CC-vs-Ex-CC: The folder contains 4 different comparisons between Ex-CC and ILS-CC: Imbalance difference, number of detected clusters, difference of the number of detected clusters, NMI (Normalized Mutual Information)

----------------------Funding: Agorantic FR 3621, FMJH Program Gaspard Monge in optimization and operation research (Project 2015-2842H)
Data from: Who shares? Who doesn't? Factors associated with openly archiving...
zenodo.org
bin, csv, txt
Updated Jun 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heather A. Piwowar; Heather A. Piwowar (2022). Data from: Who shares? Who doesn't? Factors associated with openly archiving raw research data [Dataset]. http://doi.org/10.5061/dryad.mf1sd
Explore at:
csv, bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.mf1sd
Dataset updated
Jun 1, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Heather A. Piwowar; Heather A. Piwowar
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn't, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication. Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%-35% in 2007-2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available. First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available. These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let's learn from those with high rates of sharing to embrace the full potential of our research output.
d
COVID Impact Survey - Public Data
data.world
csv, zip
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2024). COVID Impact Survey - Public Data [Dataset]. https://data.world/associatedpress/covid-impact-survey-public-data
Explore at:
csv, zipAvailable download formats
Dataset updated
Oct 16, 2024
Authors
The Associated Press
Description
Overview

The Associated Press is sharing data from the COVID Impact Survey, which provides statistics about physical health, mental health, economic security and social dynamics related to the coronavirus pandemic in the United States.

Conducted by NORC at the University of Chicago for the Data Foundation, the probability-based survey provides estimates for the United States as a whole, as well as in 10 states (California, Colorado, Florida, Louisiana, Minnesota, Missouri, Montana, New York, Oregon and Texas) and eight metropolitan areas (Atlanta, Baltimore, Birmingham, Chicago, Cleveland, Columbus, Phoenix and Pittsburgh).

The survey is designed to allow for an ongoing gauge of public perception, health and economic status to see what is shifting during the pandemic. When multiple sets of data are available, it will allow for the tracking of how issues ranging from COVID-19 symptoms to economic status change over time.

The survey is focused on three core areas of research:

Physical Health: Symptoms related to COVID-19, relevant existing conditions and health insurance coverage.

Economic and Financial Health: Employment, food security, and government cash assistance.

Social and Mental Health: Communication with friends and family, anxiety and volunteerism. (Questions based on those used on the U.S. Census Bureau’s Current Population Survey.) ## Using this Data - IMPORTANT This is survey data and must be properly weighted during analysis: DO NOT REPORT THIS DATA AS RAW OR AGGREGATE NUMBERS!!

Instead, use our queries linked below or statistical software such as R or SPSS to weight the data.

Queries

If you'd like to create a table to see how people nationally or in your state or city feel about a topic in the survey, use the survey questionnaire and codebook to match a question (the variable label) to a variable name. For instance, "How often have you felt lonely in the past 7 days?" is variable "soc5c".

Nationally: Go to this query and enter soc5c as the variable. Hit the blue Run Query button in the upper right hand corner.

Local or State: To find figures for that response in a specific state, go to this query and type in a state name and soc5c as the variable, and then hit the blue Run Query button in the upper right hand corner.

The resulting sentence you could write out of these queries is: "People in some states are less likely to report loneliness than others. For example, 66% of Louisianans report feeling lonely on none of the last seven days, compared with 52% of Californians. Nationally, 60% of people said they hadn't felt lonely."

Margin of Error

The margin of error for the national and regional surveys is found in the attached methods statement. You will need the margin of error to determine if the comparisons are statistically significant. If the difference is:

At least twice the margin of error, you can report there is a clear difference.

At least as large as the margin of error, you can report there is a slight or apparent difference.

Less than or equal to the margin of error, you can report that the respondents are divided or there is no difference. ## A Note on Timing Survey results will generally be posted under embargo on Tuesday evenings. The data is available for release at 1 p.m. ET Thursdays.

About the Data

The survey data will be provided under embargo in both comma-delimited and statistical formats.

Each set of survey data will be numbered and have the date the embargo lifts in front of it in the format of: 01_April_30_covid_impact_survey. The survey has been organized by the Data Foundation, a non-profit non-partisan think tank, and is sponsored by the Federal Reserve Bank of Minneapolis and the Packard Foundation. It is conducted by NORC at the University of Chicago, a non-partisan research organization. (NORC is not an abbreviation, it part of the organization's formal name.)

Data for the national estimates are collected using the AmeriSpeak Panel, NORC’s probability-based panel designed to be representative of the U.S. household population. Interviews are conducted with adults age 18 and over representing the 50 states and the District of Columbia. Panel members are randomly drawn from AmeriSpeak with a target of achieving 2,000 interviews in each survey. Invited panel members may complete the survey online or by telephone with an NORC telephone interviewer.

Once all the study data have been made final, an iterative raking process is used to adjust for any survey nonresponse as well as any noncoverage or under and oversampling resulting from the study specific sample design. Raking variables include age, gender, census division, race/ethnicity, education, and county groupings based on county level counts of the number of COVID-19 deaths. Demographic weighting variables were obtained from the 2020 Current Population Survey. The count of COVID-19 deaths by county was obtained from USA Facts. The weighted data reflect the U.S. population of adults age 18 and over.

Data for the regional estimates are collected using a multi-mode address-based (ABS) approach that allows residents of each area to complete the interview via web or with an NORC telephone interviewer. All sampled households are mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Interviews are conducted with adults age 18 and over with a target of achieving 400 interviews in each region in each survey.Additional details on the survey methodology and the survey questionnaire are attached below or can be found at https://www.covid-impact.org.

Attribution

Results should be credited to the COVID Impact Survey, conducted by NORC at the University of Chicago for the Data Foundation.

AP Data Distributions

To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Predictive Validity Data Set
figshare.com
txt
Updated Dec 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonio Abeyta (2022). Predictive Validity Data Set [Dataset]. http://doi.org/10.6084/m9.figshare.17030021.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17030021.v1
Dataset updated
Dec 18, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Antonio Abeyta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Verbal and Quantitative Reasoning GRE scores and percentiles were collected by querying the student database for the appropriate information. Any student records that were missing data such as GRE scores or grade point average were removed from the study before the data were analyzed. The GRE Scores of entering doctoral students from 2007-2012 were collected and analyzed. A total of 528 student records were reviewed. Ninety-six records were removed from the data because of a lack of GRE scores. Thirty-nine of these records belonged to MD/PhD applicants who were not required to take the GRE to be reviewed for admission. Fifty-seven more records were removed because they did not have an admissions committee score in the database. After 2011, the GRE’s scoring system was changed from a scale of 200-800 points per section to 130-170 points per section. As a result, 12 more records were removed because their scores were representative of the new scoring system and therefore were not able to be compared to the older scores based on raw score. After removal of these 96 records from our analyses, a total of 420 student records remained which included students that were currently enrolled, left the doctoral program without a degree, or left the doctoral program with an MS degree. To maintain consistency in the participants, we removed 100 additional records so that our analyses only considered students that had graduated with a doctoral degree. In addition, thirty-nine admissions scores were identified as outliers by statistical analysis software and removed for a final data set of 286 (see Outliers below). Outliers We used the automated ROUT method included in the PRISM software to test the data for the presence of outliers which could skew our data. The false discovery rate for outlier detection (Q) was set to 1%. After removing the 96 students without a GRE score, 432 students were reviewed for the presence of outliers. ROUT detected 39 outliers that were removed before statistical analysis was performed. Sample See detailed description in the Participants section. Linear regression analysis was used to examine potential trends between GRE scores, GRE percentiles, normalized admissions scores or GPA and outcomes between selected student groups. The D’Agostino & Pearson omnibus and Shapiro-Wilk normality tests were used to test for normality regarding outcomes in the sample. The Pearson correlation coefficient was calculated to determine the relationship between GRE scores, GRE percentiles, admissions scores or GPA (undergraduate and graduate) and time to degree. Candidacy exam results were divided into students who either passed or failed the exam. A Mann-Whitney test was then used to test for statistically significant differences between mean GRE scores, percentiles, and undergraduate GPA and candidacy exam results. Other variables were also observed such as gender, race, ethnicity, and citizenship status within the samples. Predictive Metrics. The input variables used in this study were GPA and scores and percentiles of applicants on both the Quantitative and Verbal Reasoning GRE sections. GRE scores and percentiles were examined to normalize variances that could occur between tests. Performance Metrics. The output variables used in the statistical analyses of each data set were either the amount of time it took for each student to earn their doctoral degree, or the student’s candidacy examination result.
d
Data from: Data and code from: A natural polymer material as a pesticide...
catalog.data.gov
s.cnmilf.com
+1more
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data and code from: A natural polymer material as a pesticide adjuvant for mitigating off-target drift and protecting pollinator health [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-a-natural-polymer-material-as-a-pesticide-adjuvant-for-mitigating-off-t
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description
This dataset contains all data and code required to clean the data, fit the models, and create the figures and tables for the laboratory experiment portion of the manuscript:Kannan, N., Q. D. Read, and W. Zhang. 2024. A natural polymer material as a pesticide adjuvant for mitigating off-target drift and protecting pollinator health. Heliyon, in press. https://doi.org/10.1016/j.heliyon.2024.e35510.In this dataset, we archive results from several laboratory and field trials testing different adjuvants (spray additives) that are intended to reduce particle drift, increase particle size, and slow down the particles from pesticide spray nozzles. We fit statistical models to the droplet size and speed distribution data and statistically compare different metrics between the adjuvants (sodium alginate, polyacrylamide [PAM], and control without any adjuvants). The following files are included:RawDataPAMsodAlgOxfLsr.xlsx: Raw data for primary analysesOrganizedDataPaperRevision20240614.xlsx: Raw data to produce density plots presented in Figs. 8 and 9raw_data_readme.md: Markdown file with description of the raw data filesR_code_supplement.R: All R code required to reproduce primary analysesR_code_supplement2.R: R code required to produce density plots presented in Figs. 8 and 9Intermediate R output files are also included so that tables and figures can be recreated without having to rerun the data preprocessing, model fitting, and posterior estimation steps:pam_cleaned.RData: Data combined into clean R data frames for analysisvelocityscaledlogdiamfit.rds: Fitted brms model object for velocitylnormfitreduced.rds: Fitted brms model object for diameter distributionemm_con_velo_diam_draws.RData: Posterior distributions of estimated marginal means for velocityemm_con_draws.RData: Posterior distributions of estimated marginal means for diameter distributionThe following software and package versions were used:R version 4.3.1CmdStan version 2.33.1R packages:brms version 2.20.5cmdstanr version 0.5.3fitdistrplus version 1.1-11tidybayes version 3.0.4emmeans version 1.8.9
m
Data from: Probability waves: adaptive cluster-based correction by...
data.mendeley.com
narcis.nl
Updated Feb 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DIMITRI ABRAMOV (2021). Probability waves: adaptive cluster-based correction by convolution of p-value series from mass univariate analysis [Dataset]. http://doi.org/10.17632/rrm4rkr3xn.1
Explore at:
Unique identifier
https://doi.org/10.17632/rrm4rkr3xn.1
Dataset updated
Feb 8, 2021
Authors
DIMITRI ABRAMOV
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
dataset and Octave/MatLab codes/scripts for data analysis Background: Methods for p-value correction are criticized for either increasing Type II error or improperly reducing Type I error. This problem is worse when dealing with thousands or even hundreds of paired comparisons between waves or images which are performed point-to-point. This text considers patterns in probability vectors resulting from multiple point-to-point comparisons between two event-related potentials (ERP) waves (mass univariate analysis) to correct p-values, where clusters of signiticant p-values may indicate true H0 rejection. New method: We used ERP data from normal subjects and other ones with attention deficit hyperactivity disorder (ADHD) under a cued forced two-choice test to study attention. The decimal logarithm of the p-vector (p') was convolved with a Gaussian window whose length was set as the shortest lag above which autocorrelation of each ERP wave may be assumed to have vanished. To verify the reliability of the present correction method, we realized Monte-Carlo simulations (MC) to (1) evaluate confidence intervals of rejected and non-rejected areas of our data, (2) to evaluate differences between corrected and uncorrected p-vectors or simulated ones in terms of distribution of significant p-values, and (3) to empirically verify rate of type-I error (comparing 10,000 pairs of mixed samples whit control and ADHD subjects). Results: the present method reduced the range of p'-values that did not show covariance with neighbors (type I and also type-II errors). The differences between simulation or raw p-vector and corrected p-vectors were, respectively, minimal and maximal for window length set by autocorrelation in p-vector convolution. Comparison with existing methods: Our method was less conservative while FDR methods rejected basically all significant p-values for Pz and O2 channels. The MC simulations, gold-standard method for error correction, presented 2.78±4.83% of difference (all 20 channels) from p-vector after correction, while difference between raw and corrected p-vector was 5,96±5.00% (p = 0.0003). Conclusion: As a cluster-based correction, the present new method seems to be biological and statistically suitable to correct p-values in mass univariate analysis of ERP waves, which adopts adaptive parameters to set correction.
f
Data and code from: Wild sources for host plant resistance to Bemisia tabaci...
datasetcatalog.nlm.nih.gov
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
+2more
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
McKenzie-Reynolds, Petrina; Simmons, Alvin M.; Burke, Gaelen R.; Read, Quentin; Owolabi, Isiaka A.; Biswas, Anju; Levi, Amnon (2025). Data and code from: Wild sources for host plant resistance to Bemisia tabaci in watermelon: insights from behavioral and chemical analyses [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002036692
Explore at:
Dataset updated
Jun 9, 2025
Authors
McKenzie-Reynolds, Petrina; Simmons, Alvin M.; Burke, Gaelen R.; Read, Quentin; Owolabi, Isiaka A.; Biswas, Anju; Levi, Amnon
Description
This dataset includes all raw data and statistical software code required to reproduce the analyses and graphics in the manuscript:McKenzie-Reynolds, P., I. A. Owolabi, G. R. Burke, A. Levi, A. M. Simmons, and Q. D. Read. 2025. Wild sources for host plant resistance to Bemisia tabaci in watermelon: insights from behavioral and chemical analyses. Crop Protection, in review. Citation pending. (ARIS log 426869)Whitefly infestations, primarily caused by Bemisia tabaci, pose a significant threat to watermelon production, leading to severe yield losses and increased reliance on chemical pesticides. We did a study to evaluate the potential of the desert watermelon Citrullus colocynthis and other Citrullus species genotypes for resistance to B. tabaci using oviposition assays, vertical Y-tube olfactometer assays, and gas chromatography-mass spectrometry (GC-MS) analysis of plant volatiles. This dataset contains all the raw and processed data and statistical software code to reproduce the analyses and graphics in the associated manuscript. Our statistical analysis includes Bayesian generalized linear mixed models fit to the oviposition and Y-tube olfactometer datasets, with posterior distributions of the model parameters used to estimate means for each genotype and test hypotheses comparing them. In this dataset we have included CSV files of the raw data, R statistical software code contained in RMarkdown notebooks, HTML rendered output of the notebooks including all figures, tables, and textual description of the results, and pre-fit model objects so that the notebooks may be rendered without refitting the models. The findings in the accompanying manuscript provide critical insights into resistance mechanisms in C. colocynthis and advance sustainable watermelon production, reducing chemical pesticide dependence and enhancing economic returns for growers.A full description of all files included in the dataset is found in README.pdf.
u
Data from: Data publication for: How to accelerate green technology...
pub.uni-bielefeld.de
Updated Dec 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kerstin Hötte (2020). Data publication for: How to accelerate green technology diffusion? An agent-based approach to directed technological change with coevolving absorptive capacity [Dataset]. https://pub.uni-bielefeld.de/record/2932844
Explore at:
Dataset updated
Dec 7, 2020
Authors
Kerstin Hötte
Description
Updates:

Check out the more recent versions of the rcode here: https://gitlab.ub.uni-bielefeld.de/khoette/rcode_eurace

Description

The data publication contains all resources (data, code and statistical output) required to reproduce the results presented in "How to accelerate green technology diffusion? An agent-based approach to directed technological change with coevolving absorptive capacity" (Hötte 2019). Objective of this publication is the transparency, reproducibility and reusability of the simulation model and statistical analysis.

The publication is composed of four directories: (1) The directory "model" allows the reader to understand the implementation of the simulation model (C-code), to reproduce the simulated data and to use the model for further studies. A conceptual description and technical documentation of the model is provided in the paper mentioned above. (2) The directory "experiment_directories_and_data" contains the simulated data that is presented and discussed in the paper. This data allows to reproduce exactly the statistical analyses presented in the paper and to check the general validity of the model. (3) The directory "rcode" contains the code that was used for the statistical analyses and makes the methods transparent for the reader. (4) The directory "results" contains the output files of the statistical analyses, e.g. plots and txt-output files documenting the regression analyses.

Each directory contains a readme file with additional information about the content and instructions how to use it.

Details on content of each subdirectory:

(1) model: Simulation inputs

The data provided should allow you to REPRODUCE the simulations, i.e. to produce your own simulation data that should exhibit the same patterns as those discussed in the paper. Before trying to run the model, I strongly recommend to check out the introductory and explanatory material provided by the developers of the original model: http://www.wiwi.uni-bielefeld.de/lehrbereiche/vwl/etace/Eurace_Unibi/

File content:

model files: model xml files and c code Extensions compared to "standard eurace@unibi": (Here only main modifications highlighted)

Cons_Goods_UNIBI: Production routine and investment decision modified.

Government_GREQAM: Eco policy module added.

Inv_Goods_Vintage: Adaptive pricing mechanism and endogenous innovation added/ modified.

Labour_UNIBI: Households skill endowment and learning.

Statistical_Office_UNIBI: Documentation of indicator variables.

my_library_functions.c: Running order of vintages adjusted by using costs.

its: initial population

Acknowledgement:

This model is a modified version of the Eurace@Unibi model, developed by Herbert Dawid, Simon Gemkow, Philipp Harting, Sander van der Hoog and Michael Neugart, as an extension of the research within the EU 6 th Framework Project Eurace.

(2) experiment_directories_and_data: Simulated data (raw).

This data allows you to perform STATISTICAL ANALYSES with the simulation output yourself. You may use these as input to the Rcode.
Experiment folders contain simulation files and simulation output: - baseline - rand_barr - rand_pol34_fix_barr5 - rand_pol34_rand_barr15

In principle, you should be able to reproduce the simulated data with the code provided in "model" (Note that the model has stochastic components, hence it will not be EXACTLY the same but sufficiently similar).

(3) rcode:

This documentation makes the STATISTICAL METHODS used in the paper transparent. Sorry for the inefficient code. Check whether updates are available.

rcode: R scripts used for statistical analysis of simulation output

(4) results:

Output of analysed data. Here you find the output of the statistical analyses, i.e. regression output and wilcoxon test results in txt format and plots that are used in the paper. These files can be reproduced by the R code. The results are documented experimentwise, i.e. baseline, barrier strength, policy with fix barriers, policy with random barriers.

Acknowledgement

Particular gratitude is owed to Cord Wiljes for extensive support accompanying this data publication.

Facebook

Twitter

Click to copy link

Link copied

Cite

Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart (2023). Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice [Dataset]. http://doi.org/10.1371/journal.pone.0046042

Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice

Explore at:

108 scholarly articles cite this dataset (View in Google Scholar)

tiffAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0046042

Dataset updated

Jun 8, 2023

Dataset provided by

PLOShttp://plos.org/

Authors

Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.

Clear search

Close search

Google apps

Main menu

Statistical Analysis of Individual Participant Data Meta-Analyses: A...

Data from: Replication package for the paper: "A Study on the Pythonic...

Supplementary material from "Visual comparison of two data sets: Do people...

UC_vs_US Statistic Analysis.xlsx

Household Expenditure and Income Survey 2008, Economic Research Forum (ERF)...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Experiment resources for "Quality of Binaural Rendering From Baffled...

Ballistic test results for several different soft body armor systems

Data from: Direct and trans-generational effects of tetracyclines on the...

Replication Package - How Do Requirements Evolve During Elicitation? An...

Pyhon code and raw data for our proposed algorithm(GANMA)

Data from: Data publication for: Skill transferability and the adoption of...

Background: Abstract of the underlying research paper

More detailed overview of the content:

model: Simulation inputs

experiment_directories_and_data: Simulated data (raw).

rcode:

selected_results

Acknowledgements

Expenditure and Consumption Survey, 2004 - West Bank and Gaza

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Sample and Frame:

Sample Design:

Sample strata:

Sample Size:

Target cluster size:

Mode of data collection

Research instrument

Cleaning operations

Raw Data

Harmonized Data

Response rate

Sampling error estimates

NetVotes iKnow Dataset

Data from: Who shares? Who doesn't? Factors associated with openly archiving...

COVID Impact Survey - Public Data

Overview

Queries

Margin of Error

About the Data

Attribution

AP Data Distributions

Predictive Validity Data Set

Data from: Data and code from: A natural polymer material as a pesticide...

Data from: Probability waves: adaptive cluster-based correction by...

Data and code from: Wild sources for host plant resistance to Bemisia tabaci...

Data from: Data publication for: How to accelerate green technology...

Updates:

Description

Details on content of each subdirectory:

(1) model: Simulation inputs

File content:

Acknowledgement:

(2) experiment_directories_and_data: Simulated data (raw).

(3) rcode:

(4) results:

Acknowledgement

Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice