Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A critical issue in intelligent building control is detecting energy consumption anomalies based on intelligent device status data. The building field is plagued by energy consumption anomalies caused by a number of factors, many of which are associated with one another in apparent temporal relationships. For the detection of abnormalities, most traditional detection methods rely solely on a single variable of energy consumption data and its time series changes. Therefore, they are unable to examine the correlation between the multiple characteristic factors that affect energy consumption anomalies and their relationship in time. The outcomes of anomaly detection are one-sided. To address the above problems, this paper proposes an anomaly detection method based on multivariate time series. Firstly, in order to extract the correlation between different feature variables affecting energy consumption, this paper introduces a graph convolutional network to build an anomaly detection framework. Secondly, as different feature variables have different influences on each other, the framework is enhanced by a graph attention mechanism so that time series features with higher influence on energy consumption are given more attention weights, resulting in better anomaly detection of building energy consumption. Finally, the effectiveness of this paper’s method and existing methods for detecting energy consumption anomalies in smart buildings are compared using standard data sets. The experimental results show that the model has better detection accuracy.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Blocks? Graphs? Why Not Both? Designing and Evaluating a Hybrid Programming Environment for End-users: Replication Package
This repository contains supplementary materials for the paper "Blocks? Graphs? Why Not Both? Designing and Evaluating a Hybrid Programming Environment for End-users". We provide this data for transparency reasons and to support replications of our experiemnts.
Note: This package is anonymized for peer review purposes. We will provide contact information for the authors at a later date. We also plan to add interactive versions of our tasks and tutorials in an updated version to allow readers easier exploration/experimentation.
Summary of files contained in this package
This package contains two parts:
The data-analysis/
folder contains the raw dataset we collected for our experiment in CSV format, as well as scripts we used for our analyses.
ID
contains a unique 4-digit identifier for each participant that they were assigned throughout our study.Group
contains the group (Blocks/Graph) that participants were randomly assigned to.Task1Time
and Task2Time
contain the time participants spent to complete the two programming tasks of our study in minutes.Task1Success
and Task2Success
contain a boolean value indicating whether the participants successfully completed the given task. Note that participants had unlimited attempts until they timed out after a strict time limit of 30 minutes, so if a participant was unsuccessful the corresponding time value is 30.Task1Tests
and Task2Tests
contain the number of times a participant executed their code throughout a task, including their final submission if they were successful.LearnTask
, ReadTask
and WriteTask
contain the scores that participants gave to the task editor component of their assigned programming environment. There are 3 scores for the categories "learnability", "readability" and "writability". Scores are on a 5-point scale from 1 (worst) to 5 (best).LearnTrig
, ReadTrig
and WriteTrig
contain the scores that participants gave to the trigger editor component of their assigned programming environment. There are 3 scores for the categories "learnability", "readability" and "writability". Scores are on a 5-point scale from 1 (worst) to 5 (best).LearnComp
, ReadComp
and WriteComp
contain the scores that participants gave to their assigned assigned programming environment in direct comparison to the other alternative. There are 3 scores for the categories "learnability", "readability" and "writability". Unlike in the paper, where scores are on a scale from -2 to 2, the raw scores here are on a 5-point scale from 1 (strong preference for other environment) to 5 (strong preference for own environment).survival.py
was used to perform the survival analysis presented in the paper and generate the related figure.batplot.py
was used to generate the 3x3 grid of ratings used in a figure in the paper.The materials/
folder contains the tutorials and task descriptions we presented to study participants. It also contains the exact wording of pre-screening and post-experiemental survey questions.
pre-screening.png
shows the three pre-screening questions we used to determine whether our participants could be included in our study.tutorial1_instructions.png
and tutorial1_sim.png
contain the instructions and initial simulator state we provided to participants for the first programming tutorial. This tutorial did not provide starter code and was identical for both participant groups.tutorial2_instructions.png
and tutorial2_sim.png
contain the instructions and initial simulator state we provided to participants for the second programming tutorial. This tutorial was identical for both participant groups and provided participants with starter code, which is shown in the images:
tutorial2_code_main.png
for the main program in the left canvastutorial2_code_move.png
for the definition of "Move box to the right".tutorial3_instructions_blocks.png
/tutorial3_instructions_graph.png
and tutorial3_sim.png
contain the instructions and initial simulator state we provided to participants for the third programming tutorial. This tutorial also provided participants with starter code, which is shown in the images:
tutorial3_code_main.png
for the main program in the left canvastutorial3_code_pick.png
for the definition of "Pick up box"tutorial3_code_place.png
for the definition of "Place box"task1_instructions.png
and task1_sim.png
contain the instructions and initial simulator state we provided to participants for the first programming task. The task did not provide starter code and the instructions were identical for both participant groups.task2_instructions.png
and task2_sim.png
contain the instructions and initial simulator state we provided to participants for the second programming task. The instructions were identical for both groups. This task also provided participants with starter code, which is shown in the images:
task2_code_main.png
for the main program in the left canvastask2_code_pick_prog.png
for the definition of "Pick up block"task2_code_load_trig_blocks.png
/task2_code_load_trig_graph.png
for the definition of the trigger "Ready to load machine"task2_code_load_prog.png
for the definition of "Load and activate machine"task2_code_finished_trig_blocks.png
/task2_code_finished_trig_graph.png
for the definition of the trigger "Machine finished"task2_code_finished_prog1.png
for the definition of "Get block from machine"task2_code_finished_prog2.png
for the definition of "Place block in bin"usability.png
shows the usability questions we used to determine a participant's rating of their assigned programming environment. The questions were identical for both participant groups.comprehension_blocks_1.png
and comprehension_blocks_2.png
show the program comprehension questions we used to determine whether participants in the Blocks group could understand more complex triggers.comprehension_graph_1.png
and comprehension_graph_2.png
show the program comprehension questions we used to determine whether participants in the Graph group could understand more complex triggers.comparison_blocks.png
and comparison_graph.png
show the images of triggers in the alternative environment that we showed to our participants before choosing their preferred environment. The questions were identical for both participant groups.comparison.png
shows the questions we used to determine a participant's preference between the two programming environment alternatives.Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of F1-score, Precision, and Recall of anomaly detection models.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1