Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Different graph types may differ in their suitability to support group comparisons, due to the underlying graph schemas. This study examined whether graph schemas are based on perceptual features (i.e., each graph type, e.g., bar or line graph, has its own graph schema) or common invariant structures (i.e., graph types share common schemas). Furthermore, it was of interest which graph type (bar, line, or pie) is optimal for comparing discrete groups. A switching paradigm was used in three experiments. Two graph types were examined at a time (Experiment 1: bar vs. line, Experiment 2: bar vs. pie, Experiment 3: line vs. pie). On each trial, participants received a data graph presenting the data from three groups and were to determine the numerical difference of group A and group B displayed in the graph. We scrutinized whether switching the type of graph from one trial to the next prolonged RTs. The slowing of RTs in switch trials in comparison to trials with only one graph type can indicate to what extent the graph schemas differ. As switch costs were observed in all pairings of graph types, none of the different pairs of graph types tested seems to fully share a common schema. Interestingly, there was tentative evidence for differences in switch costs among different pairings of graph types. Smaller switch costs in Experiment 1 suggested that the graph schemas of bar and line graphs overlap more strongly than those of bar graphs and pie graphs or line graphs and pie graphs. This implies that results were not in line with completely distinct schemas for different graph types either. Taken together, the pattern of results is consistent with a hierarchical view according to which a graph schema consists of parts shared for different graphs and parts that are specific for each graph type. Apart from investigating graph schemas, the study provided evidence for performance differences among graph types. We found that bar graphs yielded the fastest group comparisons compared to line graphs and pie graphs, suggesting that they are the most suitable when used to compare discrete groups.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Graph edition is a vastly studied subject, with many heuristics to compare topologies, and many NP-hard problems. Here, we present a method, relying on the specificities of what a pangenome graph is (a collection of subsequences linked by edges, that represents the embedding of genomes inside a graph structure) to formulate a O(n) solution in this specific case. It allows us to pinpoint dissimilarities between graphs, and we can analyse how such graphs differ when build with different tools, or parameters.
Warning: all graphs are given as they came out of the Minigraph-Cactus and PGGB pipelines. It means, as `rs-pancat-compare` can compare only GFA1.0 that you must perform conversion using the `vg toolkit` (see [commands available on this GitHub](https://github.com/dubssieg/pancat_paper))
Contains the raw `.fasta` genomes used to build the yeast chromosome 1 graphs described in the publication.
Contains the computed distance, variants, and sequence complexity analysis results as `.json` files.
Contains the `.gfa` graphs used for the comparison of the impact of the reference choice against the secondary genome order in Minigraph-Cactus (fig 1A of the article).
Contains the `.gfa` graphs used for the comparison of the impact of the reference choice in Minigraph-Cactus against PGGB (fig 1B of the article).
These archives are replicates with varying references of an experiment made by adding more and more genomes to the graphs. The file names ranges from 2 to 15, these numbers being the number of genomes included in the graph. (Yeast dataset, chromosome 1)
This archive contains graphs made using the same 15 genomes of yeast (chromosome 1) on three different versions of Minigraph-Cactus and three different versions of PGGB.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Measuring the quality of Question Answering (QA) systems is a crucial task to validate the results of novel approaches. However, there are already indicators of a reproducibility crisis as many published systems have used outdated datasets or use subsets of QA benchmarks, making it hard to compare results. We identified the following core problems: there is no standard data format, instead, proprietary data representations are used by the different partly inconsistent datasets; additionally, the characteristics of datasets are typically not reflected by the dataset maintainers nor by the system publishers. To overcome these problems, we established an ontology---Question Answering Dataset Ontology (QADO)---for representing the QA datasets in RDF. The following datasets were mapped into the ontology: the QALD series, LC-QuAD series, RuBQ series, ComplexWebQuestions, and Mintaka. Hence, the integrated data in QADO covers widely used datasets and multilinguality. Additionally, we did intensive analyses of the datasets to identify their characteristics to make it easier for researchers to identify specific research questions and to select well-defined subsets. The provided resource will enable the research community to improve the quality of their research and support the reproducibility of experiments.
Here, the mapping results of the QADO process, the SPARQL queries for data analytics, and the archived analytics results file are provided.
Up-to-date statistics can be created automatically by the script provided at the corresponding QADO GitHub RDFizer repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LauNuts is a RDF Knowledge Graph consisting of:
Local Administrative Units (LAU) and Nomenclature of Territorial Units for Statistics (NUTS)
Matplotlib is a tremendous visualization library in Python for 2D plots of arrays. Matplotlib may be a multi-platform data visualization library built on NumPy arrays and designed to figure with the broader SciPy stack. It had been introduced by John Hunter within the year 2002.
A bar plot or bar graph may be a graph that represents the category of knowledge with rectangular bars with lengths and heights that’s proportional to the values which they represent. The bar plots are often plotted horizontally or vertically.
A bar chart is a great way to compare categorical data across one or two dimensions. More often than not, it’s more interesting to compare values across two dimensions and for that, a grouped bar chart is needed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description. The NetVote dataset contains the outputs of the NetVote program when applied to voting data coming from VoteWatch (http://www.votewatch.eu/).
These results were used in the following conference papers:
Source code. The NetVote source code is available on GitHub: https://github.com/CompNet/NetVotes.
Citation. If you use our dataset or tool, please cite article [1] above.
@InProceedings{Mendonca2015,
author = {Mendonça, Israel and Figueiredo, Rosa and Labatut, Vincent and Michelon, Philippe},
title = {Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the {E}uropean {P}arliament},
booktitle = {2\textsuperscript{nd} European Network Intelligence Conference ({ENIC})},
year = {2015},
pages = {122-129},
address = {Karlskrona, SE},
publisher = {IEEE Publishing},
doi = {10.1109/ENIC.2015.25},
}
-------------------------
Details. This archive contains the following folders:
-------------------------
License. These data are shared under a Creative Commons 0 license.
Contact. Vincent Labatut <vincent.labatut@univ-avignon.fr> & Rosa Figueiredo <rosa.figueiredo@univ-avignon.fr>
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead of
urban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
Diffusion MRI tractography is the only noninvasive method to measure the structural connectome in humans. However, recent validation studies have revealed limitations of modern tractography approaches, which lead to significant mistracking caused in part by local uncertainties in fiber orientations that accumulate to produce larger errors for longer streamlines. Characterizing the role of this length bias in tractography is complicated by the true underlying contribution of spatial embedding to brain topology. In this work, we compare graphs constructed with ex vivo tractography data in mice and neural tracer data from the Allen Mouse Brain Connectivity Atlas to random geometric surrogate graphs which preserve the low-order distance effects from each modality in order to quantify the role of geometry in various network properties. We find that geometry plays a substantially larger role in determining the topology of graphs produced by tractography than graphs produced by tracers. Tractography underestimates weights at long distances compared to neural tracers, which leads tractography to place network hubs close to the geometric center of the brain, as do corresponding tractography-derived random geometric surrogates, while tracer graphs place hubs further into peripheral areas of the cortex. We also explore the role of spatial embedding in modular structure, network efficiency and other topological measures in both modalities. Throughout, we compare the use of two different tractography streamline node assignment strategies and find that the overall differences between tractography approaches are small relative to the differences between tractography- and tracer-derived graphs. These analyses help quantify geometric biases inherent to tractography and promote the use of geometric benchmarking in future tractography validation efforts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundSignificant milestones have been made in the development of COVID19 diagnostics Technologies. Government of the republic of Uganda and the line Ministry of Health mandated Uganda Virus Research Institute to ensure quality of COVID19 diagnostics. Re-testing was one of the methods initiated by the UVRI to implement External Quality assessment of COVID19 molecular diagnostics.Methodparticipating laboratories were required by UVRI to submit their already tested and archived nasopharyngeal samples and corresponding meta data. These were then re-tested at UVRI using the WHO Berlin protocol, the UVRI results were compared to those of the primary testing laboratories in order to ascertain performance agreement for the qualitative & quantitative results obtained. Ms Excel window 12 and GraphPad prism ver 15 was used in the analysis. Bar graphs, pie charts and line graphs were used to compare performance agreement between the reference Laboratory and primary testing Laboratories.ResultsEleven (11) Ministry of Health/Uganda Virus Research Institute COVID19 accredited laboratories participated in the re-testing of quality control samples. 5/11 (45%) of the primary testing laboratories had 100% performance agreement with that of the National Reference Laboratory for the final test result. Even where there was concordance in the final test outcome (negative or positive) between UVRI and primary testing laboratories, there were still differences in CT values. The differences in the Cycle Threshold (CT) values were insignificant except for Tenna & Pharma Laboratory and the UVRI(p = 0.0296). The difference in the CT values were not skewed to either the National reference Laboratory(UVRI) or the primary testing laboratory but varied from one laboratory to another. In the remaining 6/11 (55%) laboratories where there were discrepancies in the aggregate test results, only samples initially tested and reported as positive by the primary laboratories were tested and found to be false positives by the UVRI COVID19 National Reference Laboratory.ConclusionFalse positives were detected from public, private not for profit and private testing laboratories in almost equal proportion. There is need for standardization of molecular testing platforms in Uganda. There is also urgent need to improve on the Laboratory quality management systems of the molecular testing laboratories in order to minimize such discrepancies.
In this Zenodo repository we present the results of using KROWN to benchmark popular RDF Graph Materialization systems such as RMLMapper, RMLStreamer, Morph-KGC, SDM-RDFizer, and Ontop (in materialization mode).
What is KROWN 👑?
KROWN 👑 is a benchmark for materialization systems to construct Knowledge Graphs from (semi-)heterogeneous data sources using declarative mappings such as RML.
Many benchmarks already exist for virtualization systems e.g. GTFS-Madrid-Bench, NPD, BSBM which focus on complex queries with a single declarative mapping. However, materialization systems are unaffected by complex queries since their input is the dataset and the mappings to generate a Knowledge Graph. Some specialized datasets exist to benchmark specific limitations of materialization systems such as duplicated or empty values in datasets e.g. GENOMICS, but they do not cover all aspects of materialization systems. Therefore, it is hard to compare materialization systems among each other in general which is where KROWN 👑 comes in!
Results
The raw results are available as ZIP archives, the analysis of the results are available in the spreadsheet results.ods.
Evaluation setup
We generated several scenarios using KROWN’s data generator and executed them 5 times with KROWN’s execution framework. All experiments were performed on Ubuntu 22.04 LTS machines (Linux 5.15.0, x86_64) with each Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz, 48 GB RAM memory, and 2 GB swap memory. The output of each materialization system was set to N-Triples.
Materialization systems
We selected the most popular maintained materialization systems for constructing RDF graphs for performing our experiments with KROWN:
RMLMapper
RMLStreamer
Morph-KGC
SDM-RDFizer
OntopM (Ontop in materialization mode)
Note: KROWN is flexible and allows adding any other materialization system, see KROWN’s execution framework documentation for more information.
Scenarios
We consider the following scenarios:
Raw data: number of rows, columns and cell size
Duplicates & empty values: percentage of the data containing duplicates or empty values
Mappings: Triples Maps (TM), Predicate Object Maps (POM), Named Graph Maps (NG).
Joins: relations (1-N, N-1, N-M), conditions, and duplicates during joins
Note: KROWN is flexible and allows adding any other scenario, see KROWN’s data generator documentation for more information.
In the table below we list all parameter values we used to configure our scenarios:
Scenario
Parameter values
Raw data: rows
10K, 100K, 1M, 10M
Raw data: columns
1, 10, 20, 30
Raw data: cell size
500, 1K, 5K, 10K
Duplicates: percentage
0%, 25%, 50%, 75%, 100%
Empty values: percentage
0%, 25%, 50%, 75%, 100%
Mappings: TMs + 5POMs
1, 10, 20, 30 TMs
Mappings: 20TMs + POMs
1, 3, 5, 10 POMs
Mappings: NG in SM
1, 5, 10, 15 NGs
Mappings: NG in POM
1, 5, 10, 15 NGs
Mappings: NG in SM/POM
1/1, 5/5, 10/10, 15/15 NGs
Joins: 1-N relations
1-1, 1-5, 1-10, 1-15
Joins: N-1 relations
1-1, 5-1, 10-1, 15-1
Joins: N-M relations
3-3, 3-5, 5-3, 10-5, 5-10
Joins: join conditions
1, 5, 10, 15
Joins: join duplicates
0, 5, 10, 15
The InterpretSELDM program is a graphical post processor designed to facilitate analysis and presentation of stormwater modeling results from the Stochastic Empirical Loading and Dilution Model (SELDM), which is a stormwater model developed by the U.S. Geological Survey in cooperation with the Federal Highway Administration. SELDM simulates flows, concentrations, and loads in stormflows from upstream basins, the highway, best management practice outfalls, and in the receiving water downstream of a highway. SELDM is designed to transform complex scientific data into meaningful information about (1) the risk of adverse effects from stormwater runoff on receiving waters, (2) the potential need for mitigation measures, and (3) the potential effectiveness of management measures for reducing those risks. SELDM produces results in (relatively) easy-to-use tab delimited output files that are designed for use with spreadsheets and graphing packages. However, time is needed to learn, understand, and use the SELDM output formats. Also, the SELDM output requires post-processing to extract the specific information that commonly is of interest to the user (for example, the percentage of storms above a user-specified value). Because SELDM output files are comprehensive, the locations of specific output values may not be obvious to the novice user or the occasional model user who does not consult the detailed model documentation. The InterpretSELDM program was developed as a postprocessor to facilitate analysis and presentation of SELDM results. The program provides graphical results and tab-delimited text summaries from simulation results. InterpretSELDM provides data summaries in seconds. In comparison, manually extracting the same information from SELDM outputs could take minutes to hours. It has an easy-to-use graphical user interface designed to quickly extract dilution factors, constituent concentrations, annual loads, and annual yields from all analyses within a SELDM project. The program provides the methods necessary to create scatterplots and boxplots for the extracted results. Graphs are more effective than tabular data for evaluating and communicating risk-based information to technical and nontechnical audiences. Commonly used spreadsheets provide methods for generating graphs, but do not provide probability-plots or boxplots, which are useful for examining extreme stormflow, concentration, and load values. Probability plot axes are necessary for evaluating stormflow information because the extreme values commonly are the values of concern. Boxplots provide a simple visual summary of results that can be used to compare different simulation results. The graphs created by using the InterpretSELDM program can be copied and pasted into word processors, spreadsheets, drawing software, and other programs. The graphs also can be saved in commonly used image-file formats.
https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4231https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4231
This dataset contains the supplementary materials to our publication "Collaborative Problem Solving in Mixed Reality: A Study on Visual Graph Analysis", where we report on a study we conducted. Please refer to publication for more details, also the abstract can be found at the end of this description. The dataset contains: The collection of graphs with layout used in the study The final, randomized experiment files used in the study The source code of the study prototype The collected, anonymized data in tabular form The code for the statistical analysis The Supplemental Materials PDF Paper abstract: Problem solving is a composite cognitive process, invoking a number of systems and subsystems, such as perception and memory. Individuals may form collectives to solve a given problem together, in collaboration, especially when complexity is thought to be high. To determine if and when collaborative problem solving is desired, we must quantify collaboration first. For this, we investigate the practical virtue of collaborative problem solving. Using visual graph analysis, we perform a study with 72 participants in two countries and three languages. We compare ad hoc pairs to individuals and nominal pairs, solving two different tasks on graphs in visuospatial mixed reality. The average collaborating pair does not outdo its nominal counterpart, but it does have a significant trade-off against the individual: an ad hoc pair uses 1.46 more time to achieve 4.6 higher accuracy. We also use the concept of task instance complexity to quantify differences in complexity. As task instance complexity increases, these differences largely scale, though with two notable exceptions. With this study we show the importance of using nominal groups as benchmark in collaborative virtual environments research. We conclude that a mixed reality environment does not automatically imply superior collaboration.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This interactive tool allows users to generate tables and graphs on information relating to pregnancy and childbirth. All data comes from the CDC's PRAMS. Topics include: breastfeeding, prenatal care, insurance coverage and alcohol use during pregnancy. Background CPONDER is the interaction online data tool for the Center's for Disease Control and Prevention (CDC)'s Pregnancy Risk Assessment Monitoring System (PRAMS). PRAMS gathers state and national level data on a variety of topics related to pregnancy and childbirth. Examples of information include: breastfeeding, alcohol use, multivitamin use, prenatal care, and contraception. User Functionality Users select choices from three drop down menus to search for d ata. The menus are state, year and topic. Users can then select the specific question from PRAMS they are interested in, and the data table or graph will appear. Users can then compare that question to another state or to another year to generate a new data table or graph. Data Notes The data source for CPONDER is PRAMS. The data is from every year between 2000 and 2008, and data is available at the state and national level. However, states must have participated in PRAMS to be part of CPONDER. Not every state, and not every year for every state, is available.
The ACM dataset contains papers published in KDD, SIGMOD, SIGCOMM, MobiCOMM, and VLDB and are divided into three classes (Database, Wireless Communication, Data Mining). An heterogeneous graph is constructed, which comprises 3025 papers, 5835 authors, and 56 subjects. Paper features correspond to elements of a bag-of-words represented of keywords.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Stock market data can be interesting to analyze and as a further incentive, strong predictive models can have large financial payoff. The amount of financial data on the web is seemingly endless. A large and well structured dataset on a wide array of companies can be hard to come by. Here I provide a dataset with historical stock prices (last 5 years) for all companies currently found on the S&P 500 index.
The script I used to acquire all of these .csv files can be found in this GitHub repository In the future if you wish for a more up to date dataset, this can be used to acquire new versions of the .csv files.
The data is presented in a couple of formats to suit different individual's needs or computational limitations. I have included files containing 5 years of stock data (in the all_stocks_5yr.csv and corresponding folder) and a smaller version of the dataset (all_stocks_1yr.csv) with only the past year's stock data for those wishing to use something more manageable in size.
The folder individual_stocks_5yr contains files of data for individual stocks, labelled by their stock ticker name. The all_stocks_5yr.csv and all_stocks_1yr.csv contain this same data, presented in merged .csv files. Depending on the intended use (graphing, modelling etc.) the user may prefer one of these given formats.
All the files have the following columns: Date - in format: yy-mm-dd Open - price of the stock at market open (this is NYSE data so all in USD) High - Highest price reached in the day Low Close - Lowest price reached in the day Volume - Number of shares traded Name - the stock's ticker name
I scraped this data from Google finance using the python library 'pandas_datareader'. Special thanks to Kaggle, Github and The Market.
This dataset lends itself to a some very interesting visualizations. One can look at simple things like how prices change over time, graph an compare multiple stocks at once, or generate and graph new metrics from the data provided. From these data informative stock stats such as volatility and moving averages can be easily calculated. The million dollar question is: can you develop a model that can beat the market and allow you to make statistically informed trades!
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Summarizing graphs w.r.t. structural features is important to reduce the graph's size and make tasks like indexing, querying, and visualization feasible. These datasets help to compare and analyze algorithms for graph summarization.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Source data for Fig 1d. Contains the tables of Time (pump probe delay) vs the measured and the normalized optical density. The time axis for different data runs have been shifted to align the maximum signals for easier data comparison. The normalized data in each graph has been shifted vertically by 0.3 with respect to the next graph to compare the decay rates. Because of different experimental configurations for the NIR and the visible wavelength probes, the pump probe delay at with the peak of each experiment occurs is slightly shifted. The time axes for the longer wavelengths (900 to 1300 nm) have been shifted by 0.6 ps to align the peaks for easier comparison. For normalization, the intensity of each modulation graph has been divided by the peak intensity (positive or negative, depending on the wavelength).
Source data for Fig 2b. Contains the tables of simulated reflectance of p polarized (Rp) and s polarized (Rs) light from the device, as well as experimentally measured reflectance, versus the wavelength.
Source data for Fig 2cd. Contains the tables of wavelength and permittivities of TiN and AZO films. The comments contain the film thicknesses.
Source data for Fig 3b. Contains the color map of the reflectance modulation for TiN versus wavelength (nm) and pump-probe delay (ps)
Source data for Fig 3c. Contains the normalized transient reflectance modulation vs time of TiN on Si at a wavelength of 505 nm.
Source data for Fig 3d. Contains the color map of the reflectance modulation of AZO film vs wavelength and pump probe delay.
Source data for Fig 3e. Contains the normalized transient reflectance modulation vs time of AZO at a wavelength of 1210 nm.
Source data for Fig 4a. Contains the color map of the reflectance modulation of the device under a visible probe, versus the wavelength and the pump-probe delay.
Source data for Fig 4b. Contains the color map of the reflectance modulation of the device under an infrared probe, versus the wavelength and the pump-probe delay.
Source data for Fig 4c. Contains the absorbance of light versus the wavelength in the TiN and the AZO layers, simulated by COMSOL Multiphysics
Source data for Fig 4d. Contains the normalized reflectance modulation of TiN film (505nm wavelength), AZO film (1210nm wavelength), Device (508 and 1180 nm wavelength), versus the pump probe delay. The time axes have been shifted to align the maximum reflectance modulation of the device with that of the individual films.
Source data for Fig 5b. Contains the Normalized reflectance modulation of the device at various wavelengths versus the time, together with the fits from the model. All wavelengths are in nm and time is in ps, unless otherwise stated. The data processing and labeling is same as that of Fig 1d.
All wavelengths are in nm and time is in ps, unless otherwise stated.
https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/YYULL2https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/YYULL2
Reimplementation of four KG factorization methods and six negative sampling methods. Abstract Knowledge graphs are large, useful, but incomplete knowledge repositories. They encode knowledge through entities and relations which define each other through the connective structure of the graph. This has inspired methods for the joint embedding of entities and relations in continuous low-dimensional vector spaces, that can be used to induce new edges in the graph, i.e., link prediction in knowledge graphs. Learning these representations relies on contrasting positive instances with negative ones. Knowledge graphs include only positive relation instances, leaving the door open for a variety of methods for selecting negative examples. In this paper we present an empirical study on the impact of negative sampling on the learned embeddings, assessed through the task of link prediction. We use state-of-the-art knowledge graph embeddings -- \rescal , TransE, DistMult and ComplEX -- and evaluate on benchmark datasets -- FB15k and WN18. We compare well known methods for negative sampling and additionally propose embedding based sampling methods. We note a marked difference in the impact of these sampling methods on the two datasets, with the "traditional" corrupting positives method leading to best results on WN18, while embedding based methods benefiting the task on FB15k.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Different graph types may differ in their suitability to support group comparisons, due to the underlying graph schemas. This study examined whether graph schemas are based on perceptual features (i.e., each graph type, e.g., bar or line graph, has its own graph schema) or common invariant structures (i.e., graph types share common schemas). Furthermore, it was of interest which graph type (bar, line, or pie) is optimal for comparing discrete groups. A switching paradigm was used in three experiments. Two graph types were examined at a time (Experiment 1: bar vs. line, Experiment 2: bar vs. pie, Experiment 3: line vs. pie). On each trial, participants received a data graph presenting the data from three groups and were to determine the numerical difference of group A and group B displayed in the graph. We scrutinized whether switching the type of graph from one trial to the next prolonged RTs. The slowing of RTs in switch trials in comparison to trials with only one graph type can indicate to what extent the graph schemas differ. As switch costs were observed in all pairings of graph types, none of the different pairs of graph types tested seems to fully share a common schema. Interestingly, there was tentative evidence for differences in switch costs among different pairings of graph types. Smaller switch costs in Experiment 1 suggested that the graph schemas of bar and line graphs overlap more strongly than those of bar graphs and pie graphs or line graphs and pie graphs. This implies that results were not in line with completely distinct schemas for different graph types either. Taken together, the pattern of results is consistent with a hierarchical view according to which a graph schema consists of parts shared for different graphs and parts that are specific for each graph type. Apart from investigating graph schemas, the study provided evidence for performance differences among graph types. We found that bar graphs yielded the fastest group comparisons compared to line graphs and pie graphs, suggesting that they are the most suitable when used to compare discrete groups.