100+ datasets found
  1. f

    Data_Sheet_1_Graph schema and best graph type to compare discrete groups:...

    • frontiersin.figshare.com
    docx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fang Zhao; Robert Gaschler (2023). Data_Sheet_1_Graph schema and best graph type to compare discrete groups: Bar, line, and pie.docx [Dataset]. http://doi.org/10.3389/fpsyg.2022.991420.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Fang Zhao; Robert Gaschler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Different graph types may differ in their suitability to support group comparisons, due to the underlying graph schemas. This study examined whether graph schemas are based on perceptual features (i.e., each graph type, e.g., bar or line graph, has its own graph schema) or common invariant structures (i.e., graph types share common schemas). Furthermore, it was of interest which graph type (bar, line, or pie) is optimal for comparing discrete groups. A switching paradigm was used in three experiments. Two graph types were examined at a time (Experiment 1: bar vs. line, Experiment 2: bar vs. pie, Experiment 3: line vs. pie). On each trial, participants received a data graph presenting the data from three groups and were to determine the numerical difference of group A and group B displayed in the graph. We scrutinized whether switching the type of graph from one trial to the next prolonged RTs. The slowing of RTs in switch trials in comparison to trials with only one graph type can indicate to what extent the graph schemas differ. As switch costs were observed in all pairings of graph types, none of the different pairs of graph types tested seems to fully share a common schema. Interestingly, there was tentative evidence for differences in switch costs among different pairings of graph types. Smaller switch costs in Experiment 1 suggested that the graph schemas of bar and line graphs overlap more strongly than those of bar graphs and pie graphs or line graphs and pie graphs. This implies that results were not in line with completely distinct schemas for different graph types either. Taken together, the pattern of results is consistent with a hierarchical view according to which a graph schema consists of parts shared for different graphs and parts that are specific for each graph type. Apart from investigating graph schemas, the study provided evidence for performance differences among graph types. We found that bar graphs yielded the fastest group comparisons compared to line graphs and pie graphs, suggesting that they are the most suitable when used to compare discrete groups.

  2. m

    The banksia plot: a method for visually comparing point estimates and...

    • bridges.monash.edu
    • researchdata.edu.au
    txt
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie (2024). The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets [Dataset]. http://doi.org/10.26180/25286407.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    Monash University
    Authors
    Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1

  3. H

    Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...

    • dataverse.harvard.edu
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgios Boumis; Brad Peter (2024). Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends [Dataset]. http://doi.org/10.7910/DVN/ZZDYM9
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 8, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Georgios Boumis; Brad Peter
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...

  4. Data from: Pairwise graph edit distance characterizes the impact of the...

    • zenodo.org
    zip
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siegfried Dubois; Siegfried Dubois; Claire Lemaitre; Claire Lemaitre; Thomas Faraut; Thomas Faraut; Matthias Zytnicki; Matthias Zytnicki (2024). Pairwise graph edit distance characterizes the impact of the construction method on pangenome graphs [Dataset]. http://doi.org/10.5281/zenodo.10932490
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Siegfried Dubois; Siegfried Dubois; Claire Lemaitre; Claire Lemaitre; Thomas Faraut; Thomas Faraut; Matthias Zytnicki; Matthias Zytnicki
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Graph edition is a vastly studied subject, with many heuristics to compare topologies, and many NP-hard problems. Here, we present a method, relying on the specificities of what a pangenome graph is (a collection of subsequences linked by edges, that represents the embedding of genomes inside a graph structure) to formulate a O(n) solution in this specific case. It allows us to pinpoint dissimilarities between graphs, and we can analyse how such graphs differ when build with different tools, or parameters.

    Warning: all graphs are given as they came out of the Minigraph-Cactus and PGGB pipelines. It means, as `rs-pancat-compare` can compare only GFA1.0 that you must perform conversion using the `vg toolkit` (see [commands available on this GitHub](https://github.com/dubssieg/pancat_paper))

    Data description:

    Archive `yeast_dataset`:

    Contains the raw `.fasta` genomes used to build the yeast chromosome 1 graphs described in the publication.

    Archive `json_datasets_results`:

    Contains the computed distance, variants, and sequence complexity analysis results as `.json` files.

    Archive `reference_impact`:

    Contains the `.gfa` graphs used for the comparison of the impact of the reference choice against the secondary genome order in Minigraph-Cactus (fig 1A of the article).

    Archive `mgc_vs_pggb`:

    Contains the `.gfa` graphs used for the comparison of the impact of the reference choice in Minigraph-Cactus against PGGB (fig 1B of the article).

    Archives `growth_replicate_XX` (not kept in paper):

    These archives are replicates with varying references of an experiment made by adding more and more genomes to the graphs. The file names ranges from 2 to 15, these numbers being the number of genomes included in the graph. (Yeast dataset, chromosome 1)

    Archive `software_evolution` (not kept in paper):

    This archive contains graphs made using the same 15 genomes of yeast (chromosome 1) on three different versions of Minigraph-Cactus and three different versions of PGGB.

  5. QADO: An RDF Representation of Question Answering Datasets and their...

    • figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas Both; Oliver Schmidtke; Aleksandr Perevalov (2023). QADO: An RDF Representation of Question Answering Datasets and their Analyses for Improving Reproducibility [Dataset]. http://doi.org/10.6084/m9.figshare.21750029.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Andreas Both; Oliver Schmidtke; Aleksandr Perevalov
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Measuring the quality of Question Answering (QA) systems is a crucial task to validate the results of novel approaches. However, there are already indicators of a reproducibility crisis as many published systems have used outdated datasets or use subsets of QA benchmarks, making it hard to compare results. We identified the following core problems: there is no standard data format, instead, proprietary data representations are used by the different partly inconsistent datasets; additionally, the characteristics of datasets are typically not reflected by the dataset maintainers nor by the system publishers. To overcome these problems, we established an ontology---Question Answering Dataset Ontology (QADO)---for representing the QA datasets in RDF. The following datasets were mapped into the ontology: the QALD series, LC-QuAD series, RuBQ series, ComplexWebQuestions, and Mintaka. Hence, the integrated data in QADO covers widely used datasets and multilinguality. Additionally, we did intensive analyses of the datasets to identify their characteristics to make it easier for researchers to identify specific research questions and to select well-defined subsets. The provided resource will enable the research community to improve the quality of their research and support the reproducibility of experiments.

    Here, the mapping results of the QADO process, the SPARQL queries for data analytics, and the archived analytics results file are provided.

    Up-to-date statistics can be created automatically by the script provided at the corresponding QADO GitHub RDFizer repository.

  6. Data from: LauNuts: A Knowledge Graph to identify and compare geographic...

    • figshare.com
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Mar 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adrian Wilke; Axel Ngonga (2023). LauNuts: A Knowledge Graph to identify and compare geographic regions in the European Union [Dataset]. http://doi.org/10.6084/m9.figshare.22272067.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 22, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Adrian Wilke; Axel Ngonga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    European Union
    Description

    LauNuts is a RDF Knowledge Graph consisting of:

    Local Administrative Units (LAU) and Nomenclature of Territorial Units for Statistics (NUTS)

    https://w3id.org/launuts

  7. Group Bar Chart

    • kaggle.com
    Updated Oct 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AKV (2021). Group Bar Chart [Dataset]. https://www.kaggle.com/vermaamitesh/group-bar-chart/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    AKV
    Description

    Matplotlib is a tremendous visualization library in Python for 2D plots of arrays. Matplotlib may be a multi-platform data visualization library built on NumPy arrays and designed to figure with the broader SciPy stack. It had been introduced by John Hunter within the year 2002.

    A bar plot or bar graph may be a graph that represents the category of knowledge with rectangular bars with lengths and heights that’s proportional to the values which they represent. The bar plots are often plotted horizontally or vertically.

    A bar chart is a great way to compare categorical data across one or two dimensions. More often than not, it’s more interesting to compare values across two dimensions and for that, a grouped bar chart is needed.

  8. NetVotes ENIC Dataset

    • zenodo.org
    • explore.openaire.eu
    txt, zip
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça (2024). NetVotes ENIC Dataset [Dataset]. http://doi.org/10.5281/zenodo.6815510
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description. The NetVote dataset contains the outputs of the NetVote program when applied to voting data coming from VoteWatch (http://www.votewatch.eu/).

    These results were used in the following conference papers:

    1. I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the European Parliament,” in 2nd European Network Intelligence Conference, 2015, pp. 122–129. ⟨hal-01176090⟩ DOI: 10.1109/ENIC.2015.25
    2. I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Informative Value of Negative Links for Graph Partitioning, with an application to European Parliament Votes,” in 6ème Conférence sur les modèles et lánalyse de réseaux : approches mathématiques et informatiques, 2015, p. 12p. ⟨hal-02055158⟩

    Source code. The NetVote source code is available on GitHub: https://github.com/CompNet/NetVotes.

    Citation. If you use our dataset or tool, please cite article [1] above.


    @InProceedings{Mendonca2015,
    author = {Mendonça, Israel and Figueiredo, Rosa and Labatut, Vincent and Michelon, Philippe},

    title = {Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the {E}uropean {P}arliament},
    booktitle = {2\textsuperscript{nd} European Network Intelligence Conference ({ENIC})},
    year = {2015},
    pages = {122-129},
    address = {Karlskrona, SE},
    publisher = {IEEE Publishing},
    doi = {10.1109/ENIC.2015.25},
    }

    -------------------------

    Details. This archive contains the following folders:

    • `votewatch_data`: the raw data extracted from the VoteWatch website.
      • `VoteWatch Europe European Parliament, Council of the EU.csv`: list of the documents voted during the considered term, with some details such as the date and topic.
      • `votes_by_document`: this folder contains a collection of CSV files, each one describing the outcome of the vote session relatively to one specific document.
      • `intermediate_files`: this folder contains several CSV files:
        • `allvotes.csv`: concatenation of all vote outcomes for all documents and all MEPS. Can be considered as a compact representation of the data contained in the folder `votes_by_document`.
        • `loyalty.csv`: same thing than allvotes.csv, but for the loyalty (i.e. whether or not the MEP voted like the majority of the MEPs in his political group).
        • `MPs.csv`: list of the MEPs having voted at least once in the considered term, with their details.
        • `policies.csv`: list of the topics considered during the term.
        • `qtd_docs.csv`: list of the topics with the corresponding number of documents.
    • `parallel_ils_results`: contains the raw results of the ILS tool. This is an external algorithm able to estimate the optimal partition of the network nodes in terms of structural balance. It was applied to all the networks extracted by our scripts (from the VoteWatch data), and the produced files were placed here for postprocessing. Each subfolder corresponds to one of the topic-year pair.
    • `output_files`: contains the file produced by our scripts.
      • `agreement`: histograms representing the distributions of agreement and rebellion indices. Each subfolder corresponds to a specific topic.
      • `community_algorithms_csv`: Performances obtained by the partitioning algorithms (for both community detection and correlation clustering). Each subfolder corresponds to a specific topic.
      • `xxxx_cluster_information.csv`: table containing several variants of the imbalance measure, for the considered algorithms.
      • `community_algorithms_results`: Comparison of the partitions detected by the various algorithms considered, and distribution of the cluster/community sizes. Each subfolder corresponds to a specific topic.
      • `xxxx_cluster_comparison.csv`: table comparing the partitions detected by the community detection algorithms, in terms of Rand index and other measures.
      • `xxxx_ils_cluster_comparison.csv`: like `xxxx_cluster_comparison.csv`, except we compare the partition of community detection algorithms with that of the ILS.
      • `xxxx_yyyy_distribution.pdf`: histogram of the community (or cluster) sizes detected by algorithm `yyyy`.
      • `graphs`: the networks extracted from the vote data. Each subfolder corresponds to a specific topic.
      • `xxxx_complete_graph.graphml`: network at the Graphml format, with all the information: nodes, edges, nodal attributes (including communities), weights, etc.
      • `xxxx_edges_Gephi.csv`: only the links, with their weights (i.e. vote similarity).
      • `xxxx_graph.g`: network at the g format (for ILS).
      • `xxxx_net_measures.csv`: table containing some stats on the network (number of links, etc.).
      • `xxxx_nodes_Gephi.csv`: list of nodes (i.e. MEPs), with details.
      • `plots`: synthesis plots from the paper.

    -------------------------

    License. These data are shared under a Creative Commons 0 license.

    Contact. Vincent Labatut <vincent.labatut@univ-avignon.fr> & Rosa Figueiredo <rosa.figueiredo@univ-avignon.fr>

  9. f

    UC_vs_US Statistic Analysis.xlsx

    • figshare.com
    xlsx
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Utrecht University
    Authors
    F. (Fabiano) Dalpiaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

    Tagging scheme:
    Aligned (AL) - A concept is represented as a class in both models, either
    

    with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

    All the calculations and information provided in the following sheets
    

    originate from that raw data.

    Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
    

    including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

    Sheet 3 (Size-Ratio):
    

    The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

    Sheet 4 (Overall):
    

    Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

    For sheet 4 as well as for the following four sheets, diverging stacked bar
    

    charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

    Sheet 5 (By-Notation):
    

    Model correctness and model completeness is compared by notation - UC, US.

    Sheet 6 (By-Case):
    

    Model correctness and model completeness is compared by case - SIM, HOS, IFA.

    Sheet 7 (By-Process):
    

    Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

    Sheet 8 (By-Grade):
    

    Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.

  10. u

    Data from: The role of spatial embedding in mouse brain networks constructed...

    • knowledge.uchicago.edu
    csv, py, zip
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trinkle, Scott (2021). The role of spatial embedding in mouse brain networks constructed from diffusion tractography and tracer injections [Dataset]. http://doi.org/10.6082/uchicago.3310
    Explore at:
    zip(61266042), zip(776798548), csv(4959773), py(4375)Available download formats
    Dataset updated
    2021
    Dataset provided by
    Knowledge@UChicago
    Authors
    Trinkle, Scott
    Description

    Diffusion MRI tractography is the only noninvasive method to measure the structural connectome in humans. However, recent validation studies have revealed limitations of modern tractography approaches, which lead to significant mistracking caused in part by local uncertainties in fiber orientations that accumulate to produce larger errors for longer streamlines. Characterizing the role of this length bias in tractography is complicated by the true underlying contribution of spatial embedding to brain topology. In this work, we compare graphs constructed with ex vivo tractography data in mice and neural tracer data from the Allen Mouse Brain Connectivity Atlas to random geometric surrogate graphs which preserve the low-order distance effects from each modality in order to quantify the role of geometry in various network properties. We find that geometry plays a substantially larger role in determining the topology of graphs produced by tractography than graphs produced by tracers. Tractography underestimates weights at long distances compared to neural tracers, which leads tractography to place network hubs close to the geometric center of the brain, as do corresponding tractography-derived random geometric surrogates, while tracer graphs place hubs further into peripheral areas of the cortex. We also explore the role of spatial embedding in modular structure, network efficiency and other topological measures in both modalities. Throughout, we compare the use of two different tractography streamline node assignment strategies and find that the overall differences between tractography approaches are small relative to the differences between tractography- and tracer-derived graphs. These analyses help quantify geometric biases inherent to tractography and promote the use of geometric benchmarking in future tractography validation efforts.

  11. f

    S1 Data -

    • figshare.com
    xlsx
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erick Jacob Okek; Fredrick Joshua Masembe; Jocelyn Kiconco; John Kayiwa; Esther Amwine; Daniel Obote; Stephen Alele; Charles Nahabwe; Jackson Were; Bernard Bagaya; Stephen Balinandi; Julius Lutwama; Pontiano Kaleebu (2024). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0287272.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 24, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Erick Jacob Okek; Fredrick Joshua Masembe; Jocelyn Kiconco; John Kayiwa; Esther Amwine; Daniel Obote; Stephen Alele; Charles Nahabwe; Jackson Were; Bernard Bagaya; Stephen Balinandi; Julius Lutwama; Pontiano Kaleebu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundSignificant milestones have been made in the development of COVID19 diagnostics Technologies. Government of the republic of Uganda and the line Ministry of Health mandated Uganda Virus Research Institute to ensure quality of COVID19 diagnostics. Re-testing was one of the methods initiated by the UVRI to implement External Quality assessment of COVID19 molecular diagnostics.Methodparticipating laboratories were required by UVRI to submit their already tested and archived nasopharyngeal samples and corresponding meta data. These were then re-tested at UVRI using the WHO Berlin protocol, the UVRI results were compared to those of the primary testing laboratories in order to ascertain performance agreement for the qualitative & quantitative results obtained. Ms Excel window 12 and GraphPad prism ver 15 was used in the analysis. Bar graphs, pie charts and line graphs were used to compare performance agreement between the reference Laboratory and primary testing Laboratories.ResultsEleven (11) Ministry of Health/Uganda Virus Research Institute COVID19 accredited laboratories participated in the re-testing of quality control samples. 5/11 (45%) of the primary testing laboratories had 100% performance agreement with that of the National Reference Laboratory for the final test result. Even where there was concordance in the final test outcome (negative or positive) between UVRI and primary testing laboratories, there were still differences in CT values. The differences in the Cycle Threshold (CT) values were insignificant except for Tenna & Pharma Laboratory and the UVRI(p = 0.0296). The difference in the CT values were not skewed to either the National reference Laboratory(UVRI) or the primary testing laboratory but varied from one laboratory to another. In the remaining 6/11 (55%) laboratories where there were discrepancies in the aggregate test results, only samples initially tested and reported as positive by the primary laboratories were tested and found to be false positives by the UVRI COVID19 National Reference Laboratory.ConclusionFalse positives were detected from public, private not for profit and private testing laboratories in almost equal proportion. There is need for standardization of molecular testing platforms in Uganda. There is also urgent need to improve on the Laboratory quality management systems of the molecular testing laboratories in order to minimize such discrepancies.

  12. u

    Data from: Results of KROWN: Knowledge Graph Construction Benchmark

    • investigacion.usc.gal
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia; Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia (2024). Results of KROWN: Knowledge Graph Construction Benchmark [Dataset]. https://investigacion.usc.gal/documentos/67321d87aea56d4af0484853
    Explore at:
    Dataset updated
    2024
    Authors
    Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia; Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia
    Description

    In this Zenodo repository we present the results of using KROWN to benchmark popular RDF Graph Materialization systems such as RMLMapper, RMLStreamer, Morph-KGC, SDM-RDFizer, and Ontop (in materialization mode).

    What is KROWN 👑?

    KROWN 👑 is a benchmark for materialization systems to construct Knowledge Graphs from (semi-)heterogeneous data sources using declarative mappings such as RML.

    Many benchmarks already exist for virtualization systems e.g. GTFS-Madrid-Bench, NPD, BSBM which focus on complex queries with a single declarative mapping. However, materialization systems are unaffected by complex queries since their input is the dataset and the mappings to generate a Knowledge Graph. Some specialized datasets exist to benchmark specific limitations of materialization systems such as duplicated or empty values in datasets e.g. GENOMICS, but they do not cover all aspects of materialization systems. Therefore, it is hard to compare materialization systems among each other in general which is where KROWN 👑 comes in!

    Results

    The raw results are available as ZIP archives, the analysis of the results are available in the spreadsheet results.ods.

    Evaluation setup

    We generated several scenarios using KROWN’s data generator and executed them 5 times with KROWN’s execution framework. All experiments were performed on Ubuntu 22.04 LTS machines (Linux 5.15.0, x86_64) with each Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz, 48 GB RAM memory, and 2 GB swap memory. The output of each materialization system was set to N-Triples.

    Materialization systems

    We selected the most popular maintained materialization systems for constructing RDF graphs for performing our experiments with KROWN:

    RMLMapper

    RMLStreamer

    Morph-KGC

    SDM-RDFizer

    OntopM (Ontop in materialization mode)

    Note: KROWN is flexible and allows adding any other materialization system, see KROWN’s execution framework documentation for more information.

    Scenarios

    We consider the following scenarios:

    Raw data: number of rows, columns and cell size

    Duplicates & empty values: percentage of the data containing duplicates or empty values

    Mappings: Triples Maps (TM), Predicate Object Maps (POM), Named Graph Maps (NG).

    Joins: relations (1-N, N-1, N-M), conditions, and duplicates during joins

    Note: KROWN is flexible and allows adding any other scenario, see KROWN’s data generator documentation for more information.

    In the table below we list all parameter values we used to configure our scenarios:

    Scenario

    Parameter values

    Raw data: rows

    10K, 100K, 1M, 10M

    Raw data: columns

    1, 10, 20, 30

    Raw data: cell size

    500, 1K, 5K, 10K

    Duplicates: percentage

    0%, 25%, 50%, 75%, 100%

    Empty values: percentage

    0%, 25%, 50%, 75%, 100%

    Mappings: TMs + 5POMs

    1, 10, 20, 30 TMs

    Mappings: 20TMs + POMs

    1, 3, 5, 10 POMs

    Mappings: NG in SM

    1, 5, 10, 15 NGs

    Mappings: NG in POM

    1, 5, 10, 15 NGs

    Mappings: NG in SM/POM

    1/1, 5/5, 10/10, 15/15 NGs

    Joins: 1-N relations

    1-1, 1-5, 1-10, 1-15

    Joins: N-1 relations

    1-1, 5-1, 10-1, 15-1

    Joins: N-M relations

    3-3, 3-5, 5-3, 10-5, 5-10

    Joins: join conditions

    1, 5, 10, 15

    Joins: join duplicates

    0, 5, 10, 15

  13. d

    InterpretSELDM version 1.0 The Stochastic Empirical Loading and Dilution...

    • catalog.data.gov
    • data.usgs.gov
    • +3more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). InterpretSELDM version 1.0 The Stochastic Empirical Loading and Dilution Model (SELDM) output interpreter [Dataset]. https://catalog.data.gov/dataset/interpretseldm-version-1-0-the-stochastic-empirical-loading-and-dilution-model-seldm-outpu
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    The InterpretSELDM program is a graphical post processor designed to facilitate analysis and presentation of stormwater modeling results from the Stochastic Empirical Loading and Dilution Model (SELDM), which is a stormwater model developed by the U.S. Geological Survey in cooperation with the Federal Highway Administration. SELDM simulates flows, concentrations, and loads in stormflows from upstream basins, the highway, best management practice outfalls, and in the receiving water downstream of a highway. SELDM is designed to transform complex scientific data into meaningful information about (1) the risk of adverse effects from stormwater runoff on receiving waters, (2) the potential need for mitigation measures, and (3) the potential effectiveness of management measures for reducing those risks. SELDM produces results in (relatively) easy-to-use tab delimited output files that are designed for use with spreadsheets and graphing packages. However, time is needed to learn, understand, and use the SELDM output formats. Also, the SELDM output requires post-processing to extract the specific information that commonly is of interest to the user (for example, the percentage of storms above a user-specified value). Because SELDM output files are comprehensive, the locations of specific output values may not be obvious to the novice user or the occasional model user who does not consult the detailed model documentation. The InterpretSELDM program was developed as a postprocessor to facilitate analysis and presentation of SELDM results. The program provides graphical results and tab-delimited text summaries from simulation results. InterpretSELDM provides data summaries in seconds. In comparison, manually extracting the same information from SELDM outputs could take minutes to hours. It has an easy-to-use graphical user interface designed to quickly extract dilution factors, constituent concentrations, annual loads, and annual yields from all analyses within a SELDM project. The program provides the methods necessary to create scatterplots and boxplots for the extracted results. Graphs are more effective than tabular data for evaluating and communicating risk-based information to technical and nontechnical audiences. Commonly used spreadsheets provide methods for generating graphs, but do not provide probability-plots or boxplots, which are useful for examining extreme stormflow, concentration, and load values. Probability plot axes are necessary for evaluating stormflow information because the extreme values commonly are the values of concern. Boxplots provide a simple visual summary of results that can be used to compare different simulation results. The graphs created by using the InterpretSELDM program can be copied and pasted into word processors, spreadsheets, drawing software, and other programs. The graphs also can be saved in commonly used image-file formats.

  14. D

    Collaborative Problem Solving in Mixed Reality: A Study on Visual Graph...

    • darus.uni-stuttgart.de
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimitar Garkov; Tommaso Piselli; Emilio Di Giacomo; Karsten Klein; Giuseppe Liotta; Fabrizio Montecchiani; Falk Schreiber (2024). Collaborative Problem Solving in Mixed Reality: A Study on Visual Graph Analysis - Replication data [Dataset]. http://doi.org/10.18419/DARUS-4231
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    DaRUS
    Authors
    Dimitar Garkov; Tommaso Piselli; Emilio Di Giacomo; Karsten Klein; Giuseppe Liotta; Fabrizio Montecchiani; Falk Schreiber
    License

    https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4231https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4231

    Dataset funded by
    DFG
    Description

    This dataset contains the supplementary materials to our publication "Collaborative Problem Solving in Mixed Reality: A Study on Visual Graph Analysis", where we report on a study we conducted. Please refer to publication for more details, also the abstract can be found at the end of this description. The dataset contains: The collection of graphs with layout used in the study The final, randomized experiment files used in the study The source code of the study prototype The collected, anonymized data in tabular form The code for the statistical analysis The Supplemental Materials PDF Paper abstract: Problem solving is a composite cognitive process, invoking a number of systems and subsystems, such as perception and memory. Individuals may form collectives to solve a given problem together, in collaboration, especially when complexity is thought to be high. To determine if and when collaborative problem solving is desired, we must quantify collaboration first. For this, we investigate the practical virtue of collaborative problem solving. Using visual graph analysis, we perform a study with 72 participants in two countries and three languages. We compare ad hoc pairs to individuals and nominal pairs, solving two different tasks on graphs in visuospatial mixed reality. The average collaborating pair does not outdo its nominal counterpart, but it does have a significant trade-off against the individual: an ad hoc pair uses 1.46 more time to achieve 4.6 higher accuracy. We also use the concept of task instance complexity to quantify differences in complexity. As task instance complexity increases, these differences largely scale, though with two notable exceptions. With this study we show the importance of using nominal groups as benchmark in collaborative virtual environments research. We conclude that a mixed reality environment does not automatically imply superior collaboration.

  15. H

    CDC's PRAMS Online Data for Epidemiological Research (CPONDER)

    • data.niaid.nih.gov
    Updated Nov 30, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2010). CDC's PRAMS Online Data for Epidemiological Research (CPONDER) [Dataset]. http://doi.org/10.7910/DVN/1JPCH8
    Explore at:
    Dataset updated
    Nov 30, 2010
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This interactive tool allows users to generate tables and graphs on information relating to pregnancy and childbirth. All data comes from the CDC's PRAMS. Topics include: breastfeeding, prenatal care, insurance coverage and alcohol use during pregnancy. Background CPONDER is the interaction online data tool for the Center's for Disease Control and Prevention (CDC)'s Pregnancy Risk Assessment Monitoring System (PRAMS). PRAMS gathers state and national level data on a variety of topics related to pregnancy and childbirth. Examples of information include: breastfeeding, alcohol use, multivitamin use, prenatal care, and contraception. User Functionality Users select choices from three drop down menus to search for d ata. The menus are state, year and topic. Users can then select the specific question from PRAMS they are interested in, and the data table or graph will appear. Users can then compare that question to another state or to another year to generate a new data table or graph. Data Notes The data source for CPONDER is PRAMS. The data is from every year between 2000 and 2008, and data is available at the state and national level. However, states must have participated in PRAMS to be part of CPONDER. Not every state, and not every year for every state, is available.

  16. P

    ACM Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACM Dataset [Dataset]. https://paperswithcode.com/dataset/acm
    Explore at:
    Description

    The ACM dataset contains papers published in KDD, SIGMOD, SIGCOMM, MobiCOMM, and VLDB and are divided into three classes (Database, Wireless Communication, Data Mining). An heterogeneous graph is constructed, which comprises 3025 papers, 5835 authors, and 56 subjects. Paper features correspond to elements of a bag-of-words represented of keywords.

  17. S&P 500 stock data

    • kaggle.com
    zip
    Updated Aug 11, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cam Nugent (2017). S&P 500 stock data [Dataset]. https://www.kaggle.com/camnugent/sandp500
    Explore at:
    zip(31994392 bytes)Available download formats
    Dataset updated
    Aug 11, 2017
    Authors
    Cam Nugent
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Stock market data can be interesting to analyze and as a further incentive, strong predictive models can have large financial payoff. The amount of financial data on the web is seemingly endless. A large and well structured dataset on a wide array of companies can be hard to come by. Here I provide a dataset with historical stock prices (last 5 years) for all companies currently found on the S&P 500 index.

    The script I used to acquire all of these .csv files can be found in this GitHub repository In the future if you wish for a more up to date dataset, this can be used to acquire new versions of the .csv files.

    Content

    The data is presented in a couple of formats to suit different individual's needs or computational limitations. I have included files containing 5 years of stock data (in the all_stocks_5yr.csv and corresponding folder) and a smaller version of the dataset (all_stocks_1yr.csv) with only the past year's stock data for those wishing to use something more manageable in size.

    The folder individual_stocks_5yr contains files of data for individual stocks, labelled by their stock ticker name. The all_stocks_5yr.csv and all_stocks_1yr.csv contain this same data, presented in merged .csv files. Depending on the intended use (graphing, modelling etc.) the user may prefer one of these given formats.

    All the files have the following columns: Date - in format: yy-mm-dd Open - price of the stock at market open (this is NYSE data so all in USD) High - Highest price reached in the day Low Close - Lowest price reached in the day Volume - Number of shares traded Name - the stock's ticker name

    Acknowledgements

    I scraped this data from Google finance using the python library 'pandas_datareader'. Special thanks to Kaggle, Github and The Market.

    Inspiration

    This dataset lends itself to a some very interesting visualizations. One can look at simple things like how prices change over time, graph an compare multiple stocks at once, or generate and graph new metrics from the data provided. From these data informative stock stats such as volatility and moving averages can be easily calculated. The million dollar question is: can you develop a model that can beat the market and allow you to make statistically informed trades!

  18. Z

    Datasets for Computing k-Bisimulations for Large Graphs

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rau, Jannik (2024). Datasets for Computing k-Bisimulations for Large Graphs [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10498845
    Explore at:
    Dataset updated
    Jan 23, 2024
    Dataset authored and provided by
    Rau, Jannik
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Summarizing graphs w.r.t. structural features is important to reduce the graph's size and make tasks like indexing, querying, and visualization feasible. These datasets help to compare and analyze algorithms for graph summarization.

  19. Data from: Engineering the Temporal Dynamics of All-Optical Switching with...

    • springernature.figshare.com
    bin
    Updated Nov 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soham Saha; Benjamin T. Diroll; Mustafa Goksu Ozlu; Sarah Nahar Chowdhury; Samuel Peana; Zhaxylyk Kudyshev; Richard D. Schaller; Zubin Jacob; Vladimir Shalaev; Alexandar Kildishev; Alexandra Boltasseva (2023). Engineering the Temporal Dynamics of All-Optical Switching with Fast and Slow Materials [Dataset]. http://doi.org/10.6084/m9.figshare.23734116.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 17, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Soham Saha; Benjamin T. Diroll; Mustafa Goksu Ozlu; Sarah Nahar Chowdhury; Samuel Peana; Zhaxylyk Kudyshev; Richard D. Schaller; Zubin Jacob; Vladimir Shalaev; Alexandar Kildishev; Alexandra Boltasseva
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Source data for Fig 1d. Contains the tables of Time (pump probe delay) vs the measured and the normalized optical density. The time axis for different data runs have been shifted to align the maximum signals for easier data comparison. The normalized data in each graph has been shifted vertically by 0.3 with respect to the next graph to compare the decay rates. Because of different experimental configurations for the NIR and the visible wavelength probes, the pump probe delay at with the peak of each experiment occurs is slightly shifted. The time axes for the longer wavelengths (900 to 1300 nm) have been shifted by 0.6 ps to align the peaks for easier comparison. For normalization, the intensity of each modulation graph has been divided by the peak intensity (positive or negative, depending on the wavelength).

    Source data for Fig 2b. Contains the tables of simulated reflectance of p polarized (Rp) and s polarized (Rs) light from the device, as well as experimentally measured reflectance, versus the wavelength.

    Source data for Fig 2cd. Contains the tables of wavelength and permittivities of TiN and AZO films. The comments contain the film thicknesses.

    Source data for Fig 3b. Contains the color map of the reflectance modulation for TiN versus wavelength (nm) and pump-probe delay (ps)

    Source data for Fig 3c. Contains the normalized transient reflectance modulation vs time of TiN on Si at a wavelength of 505 nm.

    Source data for Fig 3d. Contains the color map of the reflectance modulation of AZO film vs wavelength and pump probe delay.

    Source data for Fig 3e. Contains the normalized transient reflectance modulation vs time of AZO at a wavelength of 1210 nm.

    Source data for Fig 4a. Contains the color map of the reflectance modulation of the device under a visible probe, versus the wavelength and the pump-probe delay.

    Source data for Fig 4b. Contains the color map of the reflectance modulation of the device under an infrared probe, versus the wavelength and the pump-probe delay.

    Source data for Fig 4c. Contains the absorbance of light versus the wavelength in the TiN and the AZO layers, simulated by COMSOL Multiphysics

    Source data for Fig 4d. Contains the normalized reflectance modulation of TiN film (505nm wavelength), AZO film (1210nm wavelength), Device (508 and 1180 nm wavelength), versus the pump probe delay. The time axes have been shifted to align the maximum reflectance modulation of the device with that of the individual films.

    Source data for Fig 5b. Contains the Normalized reflectance modulation of the device at various wavelengths versus the time, together with the fits from the model. All wavelengths are in nm and time is in ps, unless otherwise stated. The data processing and labeling is same as that of Fig 1d.

    All wavelengths are in nm and time is in ps, unless otherwise stated.

  20. h

    Data from: Negative Sampling for Learning Knowledge Graph Embeddings

    • heidata.uni-heidelberg.de
    zip
    Updated Sep 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhushan Kotnis; Bhushan Kotnis (2019). Negative Sampling for Learning Knowledge Graph Embeddings [Dataset]. http://doi.org/10.11588/DATA/YYULL2
    Explore at:
    zip(19883)Available download formats
    Dataset updated
    Sep 12, 2019
    Dataset provided by
    heiDATA
    Authors
    Bhushan Kotnis; Bhushan Kotnis
    License

    https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/YYULL2https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/YYULL2

    Description

    Reimplementation of four KG factorization methods and six negative sampling methods. Abstract Knowledge graphs are large, useful, but incomplete knowledge repositories. They encode knowledge through entities and relations which define each other through the connective structure of the graph. This has inspired methods for the joint embedding of entities and relations in continuous low-dimensional vector spaces, that can be used to induce new edges in the graph, i.e., link prediction in knowledge graphs. Learning these representations relies on contrasting positive instances with negative ones. Knowledge graphs include only positive relation instances, leaving the door open for a variety of methods for selecting negative examples. In this paper we present an empirical study on the impact of negative sampling on the learned embeddings, assessed through the task of link prediction. We use state-of-the-art knowledge graph embeddings -- \rescal , TransE, DistMult and ComplEX -- and evaluate on benchmark datasets -- FB15k and WN18. We compare well known methods for negative sampling and additionally propose embedding based sampling methods. We note a marked difference in the impact of these sampling methods on the two datasets, with the "traditional" corrupting positives method leading to best results on WN18, while embedding based methods benefiting the task on FB15k.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Fang Zhao; Robert Gaschler (2023). Data_Sheet_1_Graph schema and best graph type to compare discrete groups: Bar, line, and pie.docx [Dataset]. http://doi.org/10.3389/fpsyg.2022.991420.s001

Data_Sheet_1_Graph schema and best graph type to compare discrete groups: Bar, line, and pie.docx

Related Article
Explore at:
docxAvailable download formats
Dataset updated
Jun 4, 2023
Dataset provided by
Frontiers
Authors
Fang Zhao; Robert Gaschler
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Different graph types may differ in their suitability to support group comparisons, due to the underlying graph schemas. This study examined whether graph schemas are based on perceptual features (i.e., each graph type, e.g., bar or line graph, has its own graph schema) or common invariant structures (i.e., graph types share common schemas). Furthermore, it was of interest which graph type (bar, line, or pie) is optimal for comparing discrete groups. A switching paradigm was used in three experiments. Two graph types were examined at a time (Experiment 1: bar vs. line, Experiment 2: bar vs. pie, Experiment 3: line vs. pie). On each trial, participants received a data graph presenting the data from three groups and were to determine the numerical difference of group A and group B displayed in the graph. We scrutinized whether switching the type of graph from one trial to the next prolonged RTs. The slowing of RTs in switch trials in comparison to trials with only one graph type can indicate to what extent the graph schemas differ. As switch costs were observed in all pairings of graph types, none of the different pairs of graph types tested seems to fully share a common schema. Interestingly, there was tentative evidence for differences in switch costs among different pairings of graph types. Smaller switch costs in Experiment 1 suggested that the graph schemas of bar and line graphs overlap more strongly than those of bar graphs and pie graphs or line graphs and pie graphs. This implies that results were not in line with completely distinct schemas for different graph types either. Taken together, the pattern of results is consistent with a hierarchical view according to which a graph schema consists of parts shared for different graphs and parts that are specific for each graph type. Apart from investigating graph schemas, the study provided evidence for performance differences among graph types. We found that bar graphs yielded the fastest group comparisons compared to line graphs and pie graphs, suggesting that they are the most suitable when used to compare discrete groups.

Search
Clear search
Close search
Google apps
Main menu