Facebook
TwitterIntermediate analysis results files for analyses of neural populations in PfC: see GitHub repository link
Facebook
TwitterOrganic food has gained much importance due to consumers’ rising environmental and health concerns. Purchase intention of organic food has been explored widely, but the repurchase intention of organic food has gained little attention among researchers. So, it has become important to explore repurchase intention among generation Z; a generation considered more educated and aware of rising environmental concerns. Generation Z is more tech-savvy and brand conscious, so its impact on repurchase intention through consumer satisfaction has been explored. The data in this paper was collected from 400 respondents through a structured questionnaire in Islamabad, Pakistan. We used the PLS-SEM approach for data analysis and results; we found that social media influence and brand purchase impact brand awareness and positively impact brand awareness on consumer satisfaction. Moreover, it is also found that consumer satisfaction positively impacts the repurchase intention of organic food. Our study found that Generation Z has a strong social media influence, so marketers’ managers must consider and address the issues when consumers consider social media for their concerns and suggestion.
Facebook
TwitterThis dataset contains the original data, analysis data, and a results synopsis of 12 slug tests performed in 7 wells completed in unconfined fractured bedrock near the North Shore of Lake Superior in Minnesota. Aquifers tested include extrusive and intrusive volcanic rocks and slate. Estimated hydraulic conductivity range from 10.2 to 2x10-6 feet/day. Mean and median hydraulic conductivity are 3.7 and 1.6, respectively. The highest and lowest hydraulic conductivities were in slate and fractured lava, respectively. Compressed air and traditional displacement-tube methods were employed. Water levels were measured with barometrically compensated (11 tests) and absolute pressure transducers (1 test) and recorded with data loggers. Test data were analyzed with AQTESOLV software using the unconfined KGS (Hyder and others, 1994; 9 tests) and Bower-Rice, 1976 models (3 tests).This dataset contains the original data, analysis data, and a results synopsis of 12 slug tests performed in 7 wells completed in unconfined fractured bedrock near the North Shore of Lake Superior in Minnesota. Aquifers tested include extrusive and intrusive volcanic rocks and slate. Estimated hydraulic conductivity range from 10.2 to 2x10-6 feet/day. Mean and median hydraulic conductivity are 3.7 and 1.6, respectively. The highest and lowest hydraulic conductivities were in slate and fractured lava, respectively. Compressed air and traditional displacement-tube methods were employed. Water levels were measured with barometrically compensated (11 tests) and absolute pressure transducers (1 test) and recorded with data loggers. Test data were analyzed with AQTESOLV software using the unconfined KGS (Hyder and others, 1994; 9 tests) and Bower-Rice, 1976 models (3 tests). Data files include the original recorded data, data files transformed into a form necessary for AQTESLOV, AQTESOLV analysis files and results files, and a compilation of well information and slug-test results. All files are formatted as tab-delimited ASCII except for the AQTESOVE analysis and results files, which are proprietary aqt and PDF files respectively. For convenience, a Microsoft Excel file is included that contains a synopsis of the well data and slug-test results, original recorded, transformed, and plotted slug-test data, data formats, constants and variables used in the data analysis, and notes about each test. Data files include the original recorded data, data files transformed into a form necessary for AQTESLOV, AQTESOLV analysis files and results files, and a compilation of well information and slug-test results. All files are formatted as tab-delimited ASCII except for the AQTESOVE analysis and results files, which are proprietary aqt and PDF files respectively. For convenience, a Microsoft Excel file is included that contains a synopsis of the well data and slug-test results, original recorded, transformed, and plotted slug-test data, data formats, constants and variables used in the data analysis, and notes about each test.
Facebook
TwitterAs of 2023, most surveyed companies in the United States and Europe, or ** percent, claim to be either industry leaders in terms of data, analytics, and artificial intelligence (AI) function advancements or about the same as their industry peers.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Copies of Anaconda 3 Jupyter Notebooks and Python script for holistic and clustered analysis of "The Impact of COVID-19 on Technical Services Units" survey results. Data was analyzed holistically using cleaned and standardized survey results and by library type clusters. To streamline data analysis in certain locations, an off-shoot CSV file was created so data could be standardized without compromising the integrity of the parent clean file. Three Jupyter Notebooks/Python scripts are available in relation to this project: COVID_Impact_TechnicalServices_HolisticAnalysis (a holistic analysis of all survey data) and COVID_Impact_TechnicalServices_LibraryTypeAnalysis (a clustered analysis of impact by library type, clustered files available as part of the Dataverse for this project).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data derived from the statistical analysis is presented here.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.
Facebook
TwitterThe OECD Programme for International Student Assessment (PISA) surveys collected data on students’ performances in reading, mathematics and science, as well as contextual information on students’ background, home characteristics and school factors which could influence performance. This publication includes detailed information on how to analyse the PISA data, enabling researchers to both reproduce the initial results and to undertake further analyses. In addition to the inclusion of the necessary techniques, the manual also includes a detailed account of the PISA 2006 database and worked examples providing full syntax in SPSS.
Facebook
TwitterResults of statistical analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vitamin D insufficiency appears to be prevalent in SLE patients. Multiple factors potentially contribute to lower vitamin D levels, including limited sun exposure, the use of sunscreen, darker skin complexion, aging, obesity, specific medical conditions, and certain medications. The study aims to assess the risk factors associated with low vitamin D levels in SLE patients in the southern part of Bangladesh, a region noted for a high prevalence of SLE. The research additionally investigates the possible correlation between vitamin D and the SLEDAI score, seeking to understand the potential benefits of vitamin D in enhancing disease outcomes for SLE patients. The study incorporates a dataset consisting of 50 patients from the southern part of Bangladesh and evaluates their clinical and demographic data. An initial exploratory data analysis is conducted to gain insights into the data, which includes calculating means and standard deviations, performing correlation analysis, and generating heat maps. Relevant inferential statistical tests, such as the Student’s t-test, are also employed. In the machine learning part of the analysis, this study utilizes supervised learning algorithms, specifically Linear Regression (LR) and Random Forest (RF). To optimize the hyperparameters of the RF model and mitigate the risk of overfitting given the small dataset, a 3-Fold cross-validation strategy is implemented. The study also calculates bootstrapped confidence intervals to provide robust uncertainty estimates and further validate the approach. A comprehensive feature importance analysis is carried out using RF feature importance, permutation-based feature importance, and SHAP values. The LR model yields an RMSE of 4.83 (CI: 2.70, 6.76) and MAE of 3.86 (CI: 2.06, 5.86), whereas the RF model achieves better results, with an RMSE of 2.98 (CI: 2.16, 3.76) and MAE of 2.68 (CI: 1.83,3.52). Both models identify Hb, CRP, ESR, and age as significant contributors to vitamin D level predictions. Despite the lack of a significant association between SLEDAI and vitamin D in the statistical analysis, the machine learning models suggest a potential nonlinear dependency of vitamin D on SLEDAI. These findings highlight the importance of these factors in managing vitamin D levels in SLE patients. The study concludes that there is a high prevalence of vitamin D insufficiency in SLE patients. Although a direct linear correlation between the SLEDAI score and vitamin D levels is not observed, machine learning models suggest the possibility of a nonlinear relationship. Furthermore, factors such as Hb, CRP, ESR, and age are identified as more significant in predicting vitamin D levels. Thus, the study suggests that monitoring these factors may be advantageous in managing vitamin D levels in SLE patients. Given the immunological nature of SLE, the potential role of vitamin D in SLE disease activity could be substantial. Therefore, it underscores the need for further large-scale studies to corroborate this hypothesis.
Facebook
TwitterProject Title: Cancer Data Analysis for Improved Healthcare
Description:
Our Cancer Data Analysis project is a comprehensive effort aimed at harnessing the power of data to advance our understanding of cancer, improve patient care, and contribute to ongoing research in oncology. This project brings together a multidisciplinary team of researchers, data scientists, and healthcare professionals committed to making a positive impact on the fight against cancer.
Project Objectives:
Data Collection: We have compiled a diverse and extensive dataset containing information on cancer incidence, patient demographics, treatment outcomes, genomic profiles, and more. This dataset represents a valuable resource for researchers and healthcare providers.
Insights and Trends: Through advanced data analysis techniques, we aim to uncover meaningful insights into cancer trends, including the prevalence of different cancer types, regional variations, and changes over time. These insights can inform healthcare policies and resource allocation.
Treatment Optimization: By analyzing treatment outcomes and patient responses to various therapies, we aim to identify patterns that can help tailor cancer treatment plans to individual patient needs, ultimately improving survival rates and quality of life.
Epidemiological Insights: We analyze epidemiological data to track the spread of cancer
Impact:
The Cancer Data Analysis project aspires to make a significant impact on cancer research, clinical practice, and public health initiatives. By providing valuable data and insights, we hope to contribute to:
Early cancer detection and diagnosis Improved treatment protocols Enhanced patient care and support Informed healthcare policy decisions Accelerated research breakthroughs
Collaboration:
We welcome collaboration with fellow researchers, healthcare professionals, and organizations committed to the fight against cancer. Together, we can leverage data-driven approaches to drive positive change in the field of oncology.
Join us in our mission to combat cancer through data-driven insights and innovative solutions. Together, we can make a difference in the lives of cancer patients and their families.
Facebook
TwitterScientific investigation is of value only insofar as relevant results are obtained and communicated, a task that requires organizing, evaluating, analysing and unambiguously communicating the significance of data. In this context, working with ecological data, reflecting the complexities and interactions of the natural world, can be a challenge. Recent innovations for statistical analysis of multifaceted interrelated data make obtaining more accurate and meaningful results possible, but key decisions of the analyses to use, and which components to present in a scientific paper or report, may be overwhelming. We offer a 10-step protocol to streamline analysis of data that will enhance understanding of the data, the statistical models and the results, and optimize communication with the reader with respect to both the procedure and the outcomes. The protocol takes the investigator from study design and organization of data (formulating relevant questions, visualizing data collection, data...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016. I scraped this data from www.sports-reference.com in May 2018. The R code I used to scrape and wrangle the data is on GitHub. I recommend checking my kernel before starting your own analysis.
Note that the Winter and Summer Games were held in the same year up until 1992. After that, they staggered them such that Winter Games occur on a four year cycle starting with 1994, then Summer in 1996, then Winter in 1998, and so on. A common mistake people make when analyzing this data is to assume that the Summer and Winter Games have always been staggered.
The file athlete_events.csv contains 271116 rows and 15 columns. Each row corresponds to an individual athlete competing in an individual Olympic event (athlete-events). The columns are:
The Olympic data on www.sports-reference.com is the result of an incredible amount of research by a group of Olympic history enthusiasts and self-proclaimed 'statistorians'. Check out their blog for more information. All I did was consolidated their decades of work into a convenient format for data analysis.
This dataset provides an opportunity to ask questions about how the Olympics have evolved over time, including questions about the participation and performance of women, different nations, and different sports and events.
Facebook
TwitterResources for Advanced Data Analysis and VisualizationResearchers who have access to the latest analysis and visualization tools are able to use large amounts of complex data to find efficiencies in projects, designs, and resources. The Data Analysis and Assessment Center (DAAC) at ERDC's Information Technology Laboratory (ITL) provides visualization and analysis tools and support services to enable the analysis of an ever-increasing volume of data.Simplify Data Analysis and Visualization ResearchThe resources provided by the DAAC enable any user to conduct important data analysis and visualization that provides valuable insight into projects and designs and helps to find ways to save resources. The DAAC provides new tools like ezVIZ, and services such as the DAAC website, a rich resource of news about the DAAC, training materials, a community forum and tutorials on how to use data analysis and other issues.The DAAC can perform collaborative work when users prefer to do the work themselves but need help in choosing which visualization program and/or technique and using the visualization tools. The DAAC also carries out custom projects to produce high-quality animations of data, such as movies, which allow researchers to communicate their results to others.Communicate Research in ContextDAAC provides leading animation and modeling software which allows scientists and researchers may communicate all aspects of their research by setting their results in context through conceptual visualization and data analysis.Success StoriesWave Breaking and Associated Droplet and Bubble FormationWave breaking and associated droplet and bubble formation are among the most challenging problems in the field of free-surface hydrodynamics. The method of computational fluid dynamics (CFD) was used to solve this problem numerically for flow about naval vessels. The researchers wanted to animate the time-varying three-dimensional data sets using isosurfaces, but transferring the data back to the local site was a problem because the data sets were large. The DAAC visualization team solved the problem by using EnSight and ezVIZ to generate the isosurfaces, and photorealistic rendering software to produce the images for the animation.Explosive Structure Interaction Effects in Urban TerrainKnown as the Breaching Project, this research studied the effects of high-explosive (HE) charges on brick or reinforced concrete walls. The results of this research will enable the war fighter to breach a wall to enter a building where enemy forces are conducting operations against U.S. interests. Images produced show computed damaged caused by an HE charge on the outer and inner sides of a reinforced concrete wall. The ability to quickly and meaningfully analyze large simulation data sets helps guide further development of new HE package designs and better ways to deploy the HE packages. A large number of designs can be simulated and analyzed to find the best at breaching the wall. The project saves money in greatly reduced field test costs by testing only the designs which were identified in analysis as the best performers.SpecificationsAmethyst, the seven-node Linux visualization cluster housed at the DAAC, is supported by ParaView, EnSight, and ezViz visualization tools and configured as follows:Six computer nodes, each with the following specifications:CPU: 8 dual-core 2.4 Ghz, 64-bit AMD Opteron Processors (16 effective cores)Memory: 128-G RAMVideo: NVidia Quadro 5500 1-GB memoryNetwork: Infiniband Interconnect between nodes, and Gigabit Ethernet to Defense Research and Engineering Network (DREN)One storage node:Disk Space: 20-TB TerraGrid file system, mounted on all nodes as /viz and /work
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We gathered data from these notable sources such as Threat Group Cards by ThaiCERT, Malpedia by Fraunhofer FKIE, MITRE ATTCK, and the Industrial Control Systems Cyber Emergency Response Team (ICS-CERT) and compiled publicly available information, including:
From these sources, we compiled information on 120 threat groups targeting OT/ICS environments in industrial sectors such as manufacturing, energy, oil gas, industrial, petrochemical, and critical infrastructure.
Facebook
TwitterThe DART team is responsible for fulfilling ad hoc data requests that come in to the Analysis Division, FMCSA. The DART system tracks these requests, stores any coding and results, and performs internal reporting about requests received.
Facebook
TwitterThis graph presents the result of a worldwide survey of senior executives, conducted by Accenture, into the impact of big data analytics on company supply chains in 2014. In 2014, ** percent of respondents stated that their company had achieved an improvement in customer service and demand fulfillment of ** percent or greater using big data analytics.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This artifact accompanies the SEET@ICSE article "Assessing the impact of hints in learning formal specification", which reports on a user study to investigate the impact of different types of automated hints while learning a formal specification language, both in terms of immediate performance and learning retention, but also in the emotional response of the students. This research artifact provides all the material required to replicate this study (except for the proprietary questionnaires passed to assess the emotional response and user experience), as well as the collected data and data analysis scripts used for the discussion in the paper.
Dataset
The artifact contains the resources described below.
Experiment resources
The resources needed for replicating the experiment, namely in directory experiment:
alloy_sheet_pt.pdf: the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment. The sheet was passed in Portuguese due to the population of the experiment.
alloy_sheet_en.pdf: a version the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment translated into English.
docker-compose.yml: a Docker Compose configuration file to launch Alloy4Fun populated with the tasks in directory data/experiment for the 2 sessions of the experiment.
api and meteor: directories with source files for building and launching the Alloy4Fun platform for the study.
Experiment data
The task database used in our application of the experiment, namely in directory data/experiment:
Model.json, Instance.json, and Link.json: JSON files with to populate Alloy4Fun with the tasks for the 2 sessions of the experiment.
identifiers.txt: the list of all (104) available participant identifiers that can participate in the experiment.
Collected data
Data collected in the application of the experiment as a simple one-factor randomised experiment in 2 sessions involving 85 undergraduate students majoring in CSE. The experiment was validated by the Ethics Committee for Research in Social and Human Sciences of the Ethics Council of the University of Minho, where the experiment took place. Data is shared the shape of JSON and CSV files with a header row, namely in directory data/results:
data_sessions.json: data collected from task-solving in the 2 sessions of the experiment, used to calculate variables productivity (PROD1 and PROD2, between 0 and 12 solved tasks) and efficiency (EFF1 and EFF2, between 0 and 1).
data_socio.csv: data collected from socio-demographic questionnaire in the 1st session of the experiment, namely:
participant identification: participant's unique identifier (ID);
socio-demographic information: participant's age (AGE), sex (SEX, 1 through 4 for female, male, prefer not to disclosure, and other, respectively), and average academic grade (GRADE, from 0 to 20, NA denotes preference to not disclosure).
data_emo.csv: detailed data collected from the emotional questionnaire in the 2 sessions of the experiment, namely:
participant identification: participant's unique identifier (ID) and the assigned treatment (column HINT, either N, L, E or D);
detailed emotional response data: the differential in the 5-point Likert scale for each of the 14 measured emotions in the 2 sessions, ranging from -5 to -1 if decreased, 0 if maintained, from 1 to 5 if increased, or NA denoting failure to submit the questionnaire. Half of the emotions are positive (Admiration1 and Admiration2, Desire1 and Desire2, Hope1 and Hope2, Fascination1 and Fascination2, Joy1 and Joy2, Satisfaction1 and Satisfaction2, and Pride1 and Pride2), and half are negative (Anger1 and Anger2, Boredom1 and Boredom2, Contempt1 and Contempt2, Disgust1 and Disgust2, Fear1 and Fear2, Sadness1 and Sadness2, and Shame1 and Shame2). This detailed data was used to compute the aggregate data in data_emo_aggregate.csv and in the detailed discussion in Section 6 of the paper.
data_umux.csv: data collected from the user experience questionnaires in the 2 sessions of the experiment, namely:
participant identification: participant's unique identifier (ID);
user experience data: summarised user experience data from the UMUX surveys (UMUX1 and UMUX2, as a usability metric ranging from 0 to 100).
participants.txt: the list of participant identifiers that have registered for the experiment.
Analysis scripts
The analysis scripts required to replicate the analysis of the results of the experiment as reported in the paper, namely in directory analysis:
analysis.r: An R script to analyse the data in the provided CSV files; each performed analysis is documented within the file itself.
requirements.r: An R script to install the required libraries for the analysis script.
normalize_task.r: A Python script to normalize the task JSON data from file data_sessions.json into the CSV format required by the analysis script.
normalize_emo.r: A Python script to compute the aggregate emotional response in the CSV format required by the analysis script from the detailed emotional response data in the CSV format of data_emo.csv.
Dockerfile: Docker script to automate the analysis script from the collected data.
Setup
To replicate the experiment and the analysis of the results, only Docker is required.
If you wish to manually replicate the experiment and collect your own data, you'll need to install:
A modified version of the Alloy4Fun platform, which is built in the Meteor web framework. This version of Alloy4Fun is publicly available in branch study of its repository at https://github.com/haslab/Alloy4Fun/tree/study.
If you wish to manually replicate the analysis of the data collected in our experiment, you'll need to install:
Python to manipulate the JSON data collected in the experiment. Python is freely available for download at https://www.python.org/downloads/, with distributions for most platforms.
R software for the analysis scripts. R is freely available for download at https://cran.r-project.org/mirrors.html, with binary distributions available for Windows, Linux and Mac.
Usage
Experiment replication
This section describes how to replicate our user study experiment, and collect data about how different hints impact the performance of participants.
To launch the Alloy4Fun platform populated with tasks for each session, just run the following commands from the root directory of the artifact. The Meteor server may take a few minutes to launch, wait for the "Started your app" message to show.
cd experimentdocker-compose up
This will launch Alloy4Fun at http://localhost:3000. The tasks are accessed through permalinks assigned to each participant. The experiment allows for up to 104 participants, and the list of available identifiers is given in file identifiers.txt. The group of each participant is determined by the last character of the identifier, either N, L, E or D. The task database can be consulted in directory data/experiment, in Alloy4Fun JSON files.
In the 1st session, each participant was given one permalink that gives access to 12 sequential tasks. The permalink is simply the participant's identifier, so participant 0CAN would just access http://localhost:3000/0CAN. The next task is available after a correct submission to the current task or when a time-out occurs (5mins). Each participant was assigned to a different treatment group, so depending on the permalink different kinds of hints are provided. Below are 4 permalinks, each for each hint group:
Group N (no hints): http://localhost:3000/0CAN
Group L (error locations): http://localhost:3000/CA0L
Group E (counter-example): http://localhost:3000/350E
Group D (error description): http://localhost:3000/27AD
In the 2nd session, likewise the 1st session, each permalink gave access to 12 sequential tasks, and the next task is available after a correct submission or a time-out (5mins). The permalink is constructed by prepending the participant's identifier with P-. So participant 0CAN would just access http://localhost:3000/P-0CAN. In the 2nd sessions all participants were expected to solve the tasks without any hints provided, so the permalinks from different groups are undifferentiated.
Before the 1st session the participants should answer the socio-demographic questionnaire, that should ask the following information: unique identifier, age, sex, familiarity with the Alloy language, and average academic grade.
Before and after both sessions the participants should answer the standard PrEmo 2 questionnaire. PrEmo 2 is published under an Attribution-NonCommercial-NoDerivatives 4.0 International Creative Commons licence (CC BY-NC-ND 4.0). This means that you are free to use the tool for non-commercial purposes as long as you give appropriate credit, provide a link to the license, and do not modify the original material. The original material, namely the depictions of the diferent emotions, can be downloaded from https://diopd.org/premo/. The questionnaire should ask for the unique user identifier, and for the attachment with each of the depicted 14 emotions, expressed in a 5-point Likert scale.
After both sessions the participants should also answer the standard UMUX questionnaire. This questionnaire can be used freely, and should ask for the user unique identifier and answers for the standard 4 questions in a 7-point Likert scale. For information about the questions, how to implement the questionnaire, and how to compute the usability metric ranging from 0 to 100 score from the answers, please see the original paper:
Kraig Finstad. 2010. The usability metric for user experience. Interacting with computers 22, 5 (2010), 323–327.
Analysis of other applications of the experiment
This section describes how to replicate the analysis of the data collected in an application of the experiment described in Experiment replication.
The analysis script expects data in 4 CSV files,
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The csv file contains aggregated data on the results of the experiment (user_id), treatment type (group) and key user metrics(views and clicks) The task is to analyze the results of the experiment and write your recommendations.
Facebook
TwitterIntermediate analysis results files for analyses of neural populations in PfC: see GitHub repository link