100+ datasets found
  1. f

    Data_Sheet_1_Graph schema and best graph type to compare discrete groups:...

    • frontiersin.figshare.com
    docx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fang Zhao; Robert Gaschler (2023). Data_Sheet_1_Graph schema and best graph type to compare discrete groups: Bar, line, and pie.docx [Dataset]. http://doi.org/10.3389/fpsyg.2022.991420.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Fang Zhao; Robert Gaschler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Different graph types may differ in their suitability to support group comparisons, due to the underlying graph schemas. This study examined whether graph schemas are based on perceptual features (i.e., each graph type, e.g., bar or line graph, has its own graph schema) or common invariant structures (i.e., graph types share common schemas). Furthermore, it was of interest which graph type (bar, line, or pie) is optimal for comparing discrete groups. A switching paradigm was used in three experiments. Two graph types were examined at a time (Experiment 1: bar vs. line, Experiment 2: bar vs. pie, Experiment 3: line vs. pie). On each trial, participants received a data graph presenting the data from three groups and were to determine the numerical difference of group A and group B displayed in the graph. We scrutinized whether switching the type of graph from one trial to the next prolonged RTs. The slowing of RTs in switch trials in comparison to trials with only one graph type can indicate to what extent the graph schemas differ. As switch costs were observed in all pairings of graph types, none of the different pairs of graph types tested seems to fully share a common schema. Interestingly, there was tentative evidence for differences in switch costs among different pairings of graph types. Smaller switch costs in Experiment 1 suggested that the graph schemas of bar and line graphs overlap more strongly than those of bar graphs and pie graphs or line graphs and pie graphs. This implies that results were not in line with completely distinct schemas for different graph types either. Taken together, the pattern of results is consistent with a hierarchical view according to which a graph schema consists of parts shared for different graphs and parts that are specific for each graph type. Apart from investigating graph schemas, the study provided evidence for performance differences among graph types. We found that bar graphs yielded the fastest group comparisons compared to line graphs and pie graphs, suggesting that they are the most suitable when used to compare discrete groups.

  2. r

    Data from: Comparing temporal graphs using dynamic time warping

    • resodate.org
    Updated Mar 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vincent Froese; Brijnesh Jain; Rolf Niedermeier; Malte Renken (2021). Comparing temporal graphs using dynamic time warping [Dataset]. http://doi.org/10.14279/depositonce-11602
    Explore at:
    Dataset updated
    Mar 12, 2021
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Vincent Froese; Brijnesh Jain; Rolf Niedermeier; Malte Renken
    Description

    Within many real-world networks, the links between pairs of nodes change over time. Thus, there has been a recent boom in studying temporal graphs. Recognizing patterns in temporal graphs requires a proximity measure to compare different temporal graphs. To this end, we propose to study dynamic time warping on temporal graphs. We define the dynamic temporal graph warping (dtgw) distance to determine the dissimilarity of two temporal graphs. Our novel measure is flexible and can be applied in various application domains. We show that computing the dtgw-distance is a challenging (in general) NP -hard optimization problem and identify some polynomial-time solvable special cases. Moreover, we develop a quadratic programming formulation and an efficient heuristic. In experiments on real-world data, we show that the heuristic performs very well and that our dtgw-distance performs favorably in de-anonymizing networks compared to other approaches.

  3. Data from: Supporting Middle School Students’ Understanding of Time-Series...

    • tandf.figshare.com
    bin
    Updated Nov 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Mokros; Jacob Sagrans; Pendred Noyce (2025). Supporting Middle School Students’ Understanding of Time-Series Data With Graph Comparisons [Dataset]. http://doi.org/10.6084/m9.figshare.30096410.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 5, 2025
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Jan Mokros; Jacob Sagrans; Pendred Noyce
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    After participating in an afterschool program where they used the Common Online Data Analysis Platform (CODAP) to study time-series data about infectious diseases, four middle school students were interviewed to determine how they understood features of and trends within these graphs. Our focus was on how students compared graphs. Students were readily able to compare cumulative/total infection rates among two countries with differently sized populations. It was more challenging for them to link a graph of yearly cases to the corresponding graph of cumulative cases. Students offered reasonable interpretations for spikes or steady periods in the graphs. Time-series graphs are accessible for 11- to 14-year-old students, who were able to make comparisons within and between graphs. Students used proportional reasoning for one comparison task, and on the other task, while it was challenging, they were beginning to understand how yearly and cumulative graphs were related. Time-series graphs are ubiquitous and socially relevant: Students should study time-series data more regularly in school, and more research is needed on the progression of sense-making with these graphs.

  4. Z

    Data from: LauNuts: A Knowledge Graph to identify and compare geographic...

    • data.niaid.nih.gov
    • figshare.com
    • +1more
    Updated Mar 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wilke, Adrian; Ngonga, Axel (2023). LauNuts: A Knowledge Graph to identify and compare geographic regions in the European Union [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7734324
    Explore at:
    Dataset updated
    Mar 22, 2023
    Dataset provided by
    DICE, Paderborn University
    Authors
    Wilke, Adrian; Ngonga, Axel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    European Union
    Description

    LauNuts is a RDF Knowledge Graph consisting of:

    Local Administrative Units (LAU) and

    Nomenclature of Territorial Units for Statistics (NUTS)

    https://w3id.org/launuts

  5. Z

    Mix-and-Match Dataset

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Dec 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verstraaten, Merijn (2020). Mix-and-Match Dataset [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_4317448
    Explore at:
    Dataset updated
    Dec 12, 2020
    Dataset provided by
    University of Amsterdam
    Authors
    Verstraaten, Merijn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Benchmark results for "Mix-and-Match: A Model-driven Runtime Optimisation Strategy for BFS on GPUs" paper.

    Performance data for Breadth-First Search on NVidia TitanX. Including trained Binary Decision Tree model for predicting the best implementation on an input graph.

  6. Group Bar Chart

    • kaggle.com
    zip
    Updated Oct 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AKV (2021). Group Bar Chart [Dataset]. https://www.kaggle.com/vermaamitesh/group-bar-chart
    Explore at:
    zip(45858 bytes)Available download formats
    Dataset updated
    Oct 2, 2021
    Authors
    AKV
    Description

    Matplotlib is a tremendous visualization library in Python for 2D plots of arrays. Matplotlib may be a multi-platform data visualization library built on NumPy arrays and designed to figure with the broader SciPy stack. It had been introduced by John Hunter within the year 2002.

    A bar plot or bar graph may be a graph that represents the category of knowledge with rectangular bars with lengths and heights that’s proportional to the values which they represent. The bar plots are often plotted horizontally or vertically.

    A bar chart is a great way to compare categorical data across one or two dimensions. More often than not, it’s more interesting to compare values across two dimensions and for that, a grouped bar chart is needed.

  7. NetVotes ENIC Dataset

    • zenodo.org
    txt, zip
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça (2024). NetVotes ENIC Dataset [Dataset]. http://doi.org/10.5281/zenodo.6815510
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description. The NetVote dataset contains the outputs of the NetVote program when applied to voting data coming from VoteWatch (http://www.votewatch.eu/).

    These results were used in the following conference papers:

    1. I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the European Parliament,” in 2nd European Network Intelligence Conference, 2015, pp. 122–129. ⟨hal-01176090⟩ DOI: 10.1109/ENIC.2015.25
    2. I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Informative Value of Negative Links for Graph Partitioning, with an application to European Parliament Votes,” in 6ème Conférence sur les modèles et lánalyse de réseaux : approches mathématiques et informatiques, 2015, p. 12p. ⟨hal-02055158⟩

    Source code. The NetVote source code is available on GitHub: https://github.com/CompNet/NetVotes.

    Citation. If you use our dataset or tool, please cite article [1] above.


    @InProceedings{Mendonca2015,
    author = {Mendonça, Israel and Figueiredo, Rosa and Labatut, Vincent and Michelon, Philippe},

    title = {Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the {E}uropean {P}arliament},
    booktitle = {2\textsuperscript{nd} European Network Intelligence Conference ({ENIC})},
    year = {2015},
    pages = {122-129},
    address = {Karlskrona, SE},
    publisher = {IEEE Publishing},
    doi = {10.1109/ENIC.2015.25},
    }

    -------------------------

    Details. This archive contains the following folders:

    • `votewatch_data`: the raw data extracted from the VoteWatch website.
      • `VoteWatch Europe European Parliament, Council of the EU.csv`: list of the documents voted during the considered term, with some details such as the date and topic.
      • `votes_by_document`: this folder contains a collection of CSV files, each one describing the outcome of the vote session relatively to one specific document.
      • `intermediate_files`: this folder contains several CSV files:
        • `allvotes.csv`: concatenation of all vote outcomes for all documents and all MEPS. Can be considered as a compact representation of the data contained in the folder `votes_by_document`.
        • `loyalty.csv`: same thing than allvotes.csv, but for the loyalty (i.e. whether or not the MEP voted like the majority of the MEPs in his political group).
        • `MPs.csv`: list of the MEPs having voted at least once in the considered term, with their details.
        • `policies.csv`: list of the topics considered during the term.
        • `qtd_docs.csv`: list of the topics with the corresponding number of documents.
    • `parallel_ils_results`: contains the raw results of the ILS tool. This is an external algorithm able to estimate the optimal partition of the network nodes in terms of structural balance. It was applied to all the networks extracted by our scripts (from the VoteWatch data), and the produced files were placed here for postprocessing. Each subfolder corresponds to one of the topic-year pair.
    • `output_files`: contains the file produced by our scripts.
      • `agreement`: histograms representing the distributions of agreement and rebellion indices. Each subfolder corresponds to a specific topic.
      • `community_algorithms_csv`: Performances obtained by the partitioning algorithms (for both community detection and correlation clustering). Each subfolder corresponds to a specific topic.
      • `xxxx_cluster_information.csv`: table containing several variants of the imbalance measure, for the considered algorithms.
      • `community_algorithms_results`: Comparison of the partitions detected by the various algorithms considered, and distribution of the cluster/community sizes. Each subfolder corresponds to a specific topic.
      • `xxxx_cluster_comparison.csv`: table comparing the partitions detected by the community detection algorithms, in terms of Rand index and other measures.
      • `xxxx_ils_cluster_comparison.csv`: like `xxxx_cluster_comparison.csv`, except we compare the partition of community detection algorithms with that of the ILS.
      • `xxxx_yyyy_distribution.pdf`: histogram of the community (or cluster) sizes detected by algorithm `yyyy`.
      • `graphs`: the networks extracted from the vote data. Each subfolder corresponds to a specific topic.
      • `xxxx_complete_graph.graphml`: network at the Graphml format, with all the information: nodes, edges, nodal attributes (including communities), weights, etc.
      • `xxxx_edges_Gephi.csv`: only the links, with their weights (i.e. vote similarity).
      • `xxxx_graph.g`: network at the g format (for ILS).
      • `xxxx_net_measures.csv`: table containing some stats on the network (number of links, etc.).
      • `xxxx_nodes_Gephi.csv`: list of nodes (i.e. MEPs), with details.
      • `plots`: synthesis plots from the paper.

    -------------------------

    License. These data are shared under a Creative Commons 0 license.

    Contact. Vincent Labatut <vincent.labatut@univ-avignon.fr> & Rosa Figueiredo <rosa.figueiredo@univ-avignon.fr>

  8. QADO: An RDF Representation of Question Answering Datasets and their...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas Both; Oliver Schmidtke; Aleksandr Perevalov (2023). QADO: An RDF Representation of Question Answering Datasets and their Analyses for Improving Reproducibility [Dataset]. http://doi.org/10.6084/m9.figshare.21750029.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Andreas Both; Oliver Schmidtke; Aleksandr Perevalov
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Measuring the quality of Question Answering (QA) systems is a crucial task to validate the results of novel approaches. However, there are already indicators of a reproducibility crisis as many published systems have used outdated datasets or use subsets of QA benchmarks, making it hard to compare results. We identified the following core problems: there is no standard data format, instead, proprietary data representations are used by the different partly inconsistent datasets; additionally, the characteristics of datasets are typically not reflected by the dataset maintainers nor by the system publishers. To overcome these problems, we established an ontology---Question Answering Dataset Ontology (QADO)---for representing the QA datasets in RDF. The following datasets were mapped into the ontology: the QALD series, LC-QuAD series, RuBQ series, ComplexWebQuestions, and Mintaka. Hence, the integrated data in QADO covers widely used datasets and multilinguality. Additionally, we did intensive analyses of the datasets to identify their characteristics to make it easier for researchers to identify specific research questions and to select well-defined subsets. The provided resource will enable the research community to improve the quality of their research and support the reproducibility of experiments.

    Here, the mapping results of the QADO process, the SPARQL queries for data analytics, and the archived analytics results file are provided.

    Up-to-date statistics can be created automatically by the script provided at the corresponding QADO GitHub RDFizer repository.

  9. c

    Bridges of Pittsburgh

    • kilthub.cmu.edu
    application/gzip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Lincoln; Scott B. Weingart; Emma Slayton; Jessica Otis (2023). Bridges of Pittsburgh [Dataset]. http://doi.org/10.1184/R1/8276171.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Carnegie Mellon University
    Authors
    Matthew Lincoln; Scott B. Weingart; Emma Slayton; Jessica Otis
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Area covered
    Pittsburgh
    Description

    The Bridges of Pittsburgh is a highly interdisciplinary and collaborative public-facing project that pays homage both to an innovative, field-defining mathematical problem and to one of the defining features of our city. We proposed to discover how many of Pittsburgh’s 446 bridges could be traversed without crossing the same bridge twice, in the process addressing issues in processing crowdsourced GIS data, performing graph traversal with complex constraints, and using network analysis to compare communities formed by this road network to the historically-defined neighborhoods of Pittsburgh.This ZIP file contains an RStudio project, with package dependencies bundled via packrat (https://rstudio.github.io/packrat/).- The osmar/ directory contains OSM data, our processing code, and outputs used to generate the map at https://bridgesofpittsburgh.net - 2019_final_community_analysis/ contains code and derived datasets for the community analysis portion of the projectwar- The legacy/ directory contains experimental datasets and code from the earliest phase of this project, which were later superseded by the main pipeline in the osmar/ directory.Each directory contains further README.md files documenting their structure.

  10. m

    The banksia plot: a method for visually comparing point estimates and...

    • bridges.monash.edu
    • datasetcatalog.nlm.nih.gov
    • +1more
    txt
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie (2024). The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets [Dataset]. http://doi.org/10.26180/25286407.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    Monash University
    Authors
    Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1

  11. f

    Data from: FunQG: Molecular Representation Learning via Quotient Graphs

    • acs.figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hossein Hajiabolhassan; Zahra Taheri; Ali Hojatnia; Yavar Taheri Yeganeh (2023). FunQG: Molecular Representation Learning via Quotient Graphs [Dataset]. http://doi.org/10.1021/acs.jcim.3c00445.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    ACS Publications
    Authors
    Hossein Hajiabolhassan; Zahra Taheri; Ali Hojatnia; Yavar Taheri Yeganeh
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    To accurately predict molecular properties, it is important to learn expressive molecular representations. Graph neural networks (GNNs) have made significant advances in this area, but they often face limitations like neighbors-explosion, under-reaching, oversmoothing, and oversquashing. Additionally, GNNs tend to have high computational costs due to their large number of parameters. These limitations emerge or increase when dealing with larger graphs or deeper GNN models. One potential solution is to simplify the molecular graph into a smaller, richer, and more informative one that is easier to train GNNs. Our proposed molecular graph coarsening framework called FunQG, uses Functional groups as building blocks to determine a molecule’s properties, based on a graph-theoretic concept called Quotient Graph. We show through experiments that the resulting informative graphs are much smaller than the original molecular graphs and are thus more suitable for training GNNs. We apply FunQG to popular molecular property prediction benchmarks and compare the performance of popular baseline GNNs on the resulting data sets to that of state-of-the-art baselines on the original data sets. Our experiments demonstrate that FunQG yields notable results on various data sets while dramatically reducing the number of parameters and computational costs. By utilizing functional groups, we can achieve an interpretable framework that indicates their significant role in determining the properties of molecular quotient graphs. Consequently, FunQG is a straightforward, computationally efficient, and generalizable solution for addressing the molecular representation learning problem.

  12. S&P 500 stock data

    • kaggle.com
    zip
    Updated Feb 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cam Nugent (2018). S&P 500 stock data [Dataset]. https://www.kaggle.com/camnugent/sandp500
    Explore at:
    zip(20283917 bytes)Available download formats
    Dataset updated
    Feb 10, 2018
    Authors
    Cam Nugent
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Stock market data can be interesting to analyze and as a further incentive, strong predictive models can have large financial payoff. The amount of financial data on the web is seemingly endless. A large and well structured dataset on a wide array of companies can be hard to come by. Here I provide a dataset with historical stock prices (last 5 years) for all companies currently found on the S&P 500 index.

    The script I used to acquire all of these .csv files can be found in this GitHub repository In the future if you wish for a more up to date dataset, this can be used to acquire new versions of the .csv files.

    Feb 2018 note: I have just updated the dataset to include data up to Feb 2018. I have also accounted for changes in the stocks on the S&P 500 index (RIP whole foods etc. etc.).

    Content

    The data is presented in a couple of formats to suit different individual's needs or computational limitations. I have included files containing 5 years of stock data (in the all_stocks_5yr.csv and corresponding folder).

    The folder individual_stocks_5yr contains files of data for individual stocks, labelled by their stock ticker name. The all_stocks_5yr.csv contains the same data, presented in a merged .csv file. Depending on the intended use (graphing, modelling etc.) the user may prefer one of these given formats.

    All the files have the following columns: Date - in format: yy-mm-dd

    Open - price of the stock at market open (this is NYSE data so all in USD)

    High - Highest price reached in the day

    Low Close - Lowest price reached in the day

    Volume - Number of shares traded

    Name - the stock's ticker name

    Acknowledgements

    Due to volatility in google finance, for the newest version I have switched over to acquiring the data from The Investor's Exchange api, the simple script I use to do this is found here. Special thanks to Kaggle, Github, pandas_datareader and The Market.

    Inspiration

    This dataset lends itself to a some very interesting visualizations. One can look at simple things like how prices change over time, graph an compare multiple stocks at once, or generate and graph new metrics from the data provided. From these data informative stock stats such as volatility and moving averages can be easily calculated. The million dollar question is: can you develop a model that can beat the market and allow you to make statistically informed trades!

  13. d

    Data from: InterpretSELDM version 1.0 The Stochastic Empirical Loading and...

    • datasets.ai
    • data.usgs.gov
    • +2more
    55
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of the Interior (2023). InterpretSELDM version 1.0 The Stochastic Empirical Loading and Dilution Model (SELDM) output interpreter [Dataset]. https://datasets.ai/datasets/interpretseldm-version-1-0-the-stochastic-empirical-loading-and-dilution-model-seldm-outpu
    Explore at:
    55Available download formats
    Dataset updated
    Jun 1, 2023
    Dataset authored and provided by
    Department of the Interior
    Description

    The InterpretSELDM program is a graphical post processor designed to facilitate analysis and presentation of stormwater modeling results from the Stochastic Empirical Loading and Dilution Model (SELDM), which is a stormwater model developed by the U.S. Geological Survey in cooperation with the Federal Highway Administration. SELDM simulates flows, concentrations, and loads in stormflows from upstream basins, the highway, best management practice outfalls, and in the receiving water downstream of a highway. SELDM is designed to transform complex scientific data into meaningful information about (1) the risk of adverse effects from stormwater runoff on receiving waters, (2) the potential need for mitigation measures, and (3) the potential effectiveness of management measures for reducing those risks. SELDM produces results in (relatively) easy-to-use tab delimited output files that are designed for use with spreadsheets and graphing packages. However, time is needed to learn, understand, and use the SELDM output formats. Also, the SELDM output requires post-processing to extract the specific information that commonly is of interest to the user (for example, the percentage of storms above a user-specified value). Because SELDM output files are comprehensive, the locations of specific output values may not be obvious to the novice user or the occasional model user who does not consult the detailed model documentation. The InterpretSELDM program was developed as a postprocessor to facilitate analysis and presentation of SELDM results. The program provides graphical results and tab-delimited text summaries from simulation results. InterpretSELDM provides data summaries in seconds. In comparison, manually extracting the same information from SELDM outputs could take minutes to hours. It has an easy-to-use graphical user interface designed to quickly extract dilution factors, constituent concentrations, annual loads, and annual yields from all analyses within a SELDM project. The program provides the methods necessary to create scatterplots and boxplots for the extracted results. Graphs are more effective than tabular data for evaluating and communicating risk-based information to technical and nontechnical audiences. Commonly used spreadsheets provide methods for generating graphs, but do not provide probability-plots or boxplots, which are useful for examining extreme stormflow, concentration, and load values. Probability plot axes are necessary for evaluating stormflow information because the extreme values commonly are the values of concern. Boxplots provide a simple visual summary of results that can be used to compare different simulation results. The graphs created by using the InterpretSELDM program can be copied and pasted into word processors, spreadsheets, drawing software, and other programs. The graphs also can be saved in commonly used image-file formats.

  14. u

    Data from: The role of spatial embedding in mouse brain networks constructed...

    • knowledge.uchicago.edu
    csv, py, zip
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trinkle, Scott (2021). The role of spatial embedding in mouse brain networks constructed from diffusion tractography and tracer injections [Dataset]. http://doi.org/10.6082/uchicago.3310
    Explore at:
    zip(61266042), zip(776798548), csv(4959773), py(4375)Available download formats
    Dataset updated
    2021
    Dataset provided by
    Knowledge@UChicago
    Authors
    Trinkle, Scott
    Description

    Diffusion MRI tractography is the only noninvasive method to measure the structural connectome in humans. However, recent validation studies have revealed limitations of modern tractography approaches, which lead to significant mistracking caused in part by local uncertainties in fiber orientations that accumulate to produce larger errors for longer streamlines. Characterizing the role of this length bias in tractography is complicated by the true underlying contribution of spatial embedding to brain topology. In this work, we compare graphs constructed with ex vivo tractography data in mice and neural tracer data from the Allen Mouse Brain Connectivity Atlas to random geometric surrogate graphs which preserve the low-order distance effects from each modality in order to quantify the role of geometry in various network properties. We find that geometry plays a substantially larger role in determining the topology of graphs produced by tractography than graphs produced by tracers. Tractography underestimates weights at long distances compared to neural tracers, which leads tractography to place network hubs close to the geometric center of the brain, as do corresponding tractography-derived random geometric surrogates, while tracer graphs place hubs further into peripheral areas of the cortex. We also explore the role of spatial embedding in modular structure, network efficiency and other topological measures in both modalities. Throughout, we compare the use of two different tractography streamline node assignment strategies and find that the overall differences between tractography approaches are small relative to the differences between tractography- and tracer-derived graphs. These analyses help quantify geometric biases inherent to tractography and promote the use of geometric benchmarking in future tractography validation efforts.

  15. D

    Collaborative Problem Solving in Mixed Reality: A Study on Visual Graph...

    • darus.uni-stuttgart.de
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimitar Garkov; Tommaso Piselli; Emilio Di Giacomo; Karsten Klein; Giuseppe Liotta; Fabrizio Montecchiani; Falk Schreiber (2025). Collaborative Problem Solving in Mixed Reality: A Study on Visual Graph Analysis - Replication data [Dataset]. http://doi.org/10.18419/DARUS-4231
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 6, 2025
    Dataset provided by
    DaRUS
    Authors
    Dimitar Garkov; Tommaso Piselli; Emilio Di Giacomo; Karsten Klein; Giuseppe Liotta; Fabrizio Montecchiani; Falk Schreiber
    License

    https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18419/DARUS-4231https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18419/DARUS-4231

    Dataset funded by
    DFG
    Description

    This dataset contains the supplementary materials to our publication "Collaborative Problem Solving in Mixed Reality: A Study on Visual Graph Analysis", where we report on a study we conducted. Please refer to publication for more details, also the abstract can be found at the end of this description. The dataset contains: The collection of graphs with layout used in the study The final, randomized experiment files used in the study The source code of the study prototype The collected, anonymized data in tabular form The code for the statistical analysis The Supplemental Materials PDF The documents used in the study procedure (English, Italian, German) Paper abstract: Problem solving is a composite cognitive process, invoking a number of cognitive mechanisms, such as perception and memory. Individuals may form collectives to solve a given problem together, in collaboration, especially when complexity is thought to be high. To determine if and when collaborative problem solving is desired, we must quantify collaboration first. For this, we investigate the practical virtue of collaborative problem solving. Using visual graph analysis, we perform a study with 72 participants in two countries and three languages. We compare ad hoc pairs to individuals and nominal pairs, solving two different tasks on graphs in visuospatial mixed reality. The average collaborating pair does not outdo its nominal counterpart, but it does have a significant trade-off against the individual: an ad hoc pair uses 1.46 more time to achieve 4.6% higher accuracy. We also use the concept of task instance complexity to quantify differences in complexity. As task instance complexity increases, these differences largely scale, though with two notable exceptions. With this study we show the importance of using nominal groups as benchmark in collaborative virtual environments research. We conclude that a mixed reality environment does not automatically imply superior collaboration.

  16. Stocks Data- Individual stock 5 years

    • kaggle.com
    zip
    Updated Sep 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    singole (2022). Stocks Data- Individual stock 5 years [Dataset]. https://www.kaggle.com/datasets/singole/stocks-data-individual-stock-5-years
    Explore at:
    zip(10270219 bytes)Available download formats
    Dataset updated
    Sep 7, 2022
    Authors
    singole
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    About Dataset Context Stock market data can be interesting to analyze and as a further incentive, strong predictive models can have large financial payoff. The amount of financial data on the web is seemingly endless. A large and well structured dataset on a wide array of companies can be hard to come by. Here I provide a dataset with historical stock prices (last 5 years) for all companies currently found on the S&P 500 index.

    The script I used to acquire all of these .csv files can be found in this GitHub repository In the future if you wish for a more up to date dataset, this can be used to acquire new versions of the .csv files.

    Feb 2018 note: I have just updated the dataset to include data up to Feb 2018. I have also accounted for changes in the stocks on the S&P 500 index (RIP whole foods etc. etc.).

    Content The data is presented in a couple of formats to suit different individual's needs or computational limitations. I have included files containing 5 years of stock data (in the allstocks5yr.csv and corresponding folder).

    The folder individualstocks5yr contains files of data for individual stocks, labelled by their stock ticker name. The allstocks5yr.csv contains the same data, presented in a merged .csv file. Depending on the intended use (graphing, modelling etc.) the user may prefer one of these given formats.

    All the files have the following columns: Date - in format: yy-mm-dd

    Open - price of the stock at market open (this is NYSE data so all in USD)

    High - Highest price reached in the day

    Low Close - Lowest price reached in the day

    Volume - Number of shares traded

    Name - the stock's ticker name

    Acknowledgements Due to volatility in google finance, for the newest version I have switched over to acquiring the data from The Investor's Exchange api, the simple script I use to do this is found here. Special thanks to Kaggle, Github, pandas_datareader and The Market.

    Inspiration This dataset lends itself to a some very interesting visualizations. One can look at simple things like how prices change over time, graph an compare multiple stocks at once, or generate and graph new metrics from the data provided. From these data informative stock stats such as volatility and moving averages can be easily calculated. The million dollar question is: can you develop a model that can beat the market and allow you to make statistically informed trades!

  17. a

    End-to-End Response Time by Input Token Count by Models Model

    • artificialanalysis.ai
    Updated Jan 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2024). End-to-End Response Time by Input Token Count by Models Model [Dataset]. https://artificialanalysis.ai/models
    Explore at:
    Dataset updated
    Jan 15, 2024
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Seconds to Output 500 Tokens, including reasoning model 'thinking' time; Lower is better by Model

  18. NBA Rookies Performance Statistics and Minutes

    • kaggle.com
    zip
    Updated Jan 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). NBA Rookies Performance Statistics and Minutes [Dataset]. https://www.kaggle.com/datasets/thedevastator/nba-rookies-performance-statistics-and-minutes-p
    Explore at:
    zip(126219 bytes)Available download formats
    Dataset updated
    Jan 15, 2023
    Authors
    The Devastator
    Description

    NBA Rookies Performance Statistics and Minutes Played: 1980-2016

    Tracking Basketball Prodigies' Growth and Achievements

    By Gabe Salzer [source]

    About this dataset

    This dataset contains essential performance statistics for NBA rookies from 1980-2016. Here you can find minute per game stats, points scored, field goals made and attempted, three-pointers made and attempted, free throws made and attempted (with the respective percentages for each), offensive rebounds, defensive rebounds, assists, steals blocks turnovers efficiency rating and Hall of Fame induction year. It is organized in descending order by minutes played per game as well as draft year. This Kaggle dataset is an excellent resource for basketball analysts to gain a better understanding of how rookies have evolved over the years—from their stats to how they were inducted into the Hall of Fame. With its great detail on individual players' performance data this dataset allows you to compare their performances against different eras in NBA history along with overall trends in rookie statistics. Compare rookies drafted far apart or those that played together- whatever your goal may be!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is perfect for providing insight into the performance of NBA rookies over an extended period of time. The data covers rookie stats from 1980 to 2016 and includes statistics such as points scored, field goals made, free throw percentage, offensive rebounds, defensive rebounds and assists. It also provides the name of each rookie along with the year they were drafted and their Hall of Fame class.

    This data set is useful for researching how rookies’ stats have changed over time in order to compare different eras or identify trends in player performance. It can also be used to evaluate players by comparing their stats against those of other players or previous years’ stats.

    In order to use this dataset effectively, a few tips are helpful:

    • Consider using Field Goal Percentage (FG%), Three Point Percentage (3P%) and Free Throw Percentage (FT%) to measure a player’s efficiency beyond just points scored or field goals made/attempted (FGM/FGA).

    • Lookout for anomalies such as low efficiency ratings despite high minutes played as this could indicate that either a player has not had enough playing time in order for their statistics to reach what would be per game average when playing more minutes or that they simply did not play well over that short period with limited opportunities.

    • Try different visualizations with the data such as histograms, line graphs and scatter plots because each may offer different insights into varied aspects of the data set like comparison between individual years vs aggregate trends over multiple years etc.

      Lastly it is important keep in mind whether you're dealing with cumulative totals over multiple seasons versus looking at individual season averages or per game numbers when attempting analysis on these sets!

    Research Ideas

    • Evaluating the performance of historical NBA rookies over time and how this can help inform future draft picks in the NBA.
    • Analysing the relative importance of certain performance stats, such as three-point percentage, to overall success and Hall of Fame induction from 1980-2016.
    • Comparing rookie seasons across different years to identify common trends in terms of statistical contributions and development over time

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: NBA Rookies by Year_Hall of Fame Class.csv | Column name | Description | |:-----------------------|:------------------------------------------------------------------| | Name | The name of...

  19. Domestic Earnings, Ratings, Titles, and Franchises

    • kaggle.com
    zip
    Updated Jan 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Domestic Earnings, Ratings, Titles, and Franchises [Dataset]. https://www.kaggle.com/datasets/thedevastator/domestic-earnings-ratings-titles-and-franchises/code
    Explore at:
    zip(8820 bytes)Available download formats
    Dataset updated
    Jan 16, 2023
    Authors
    The Devastator
    Description

    Domestic Earnings, Ratings, Titles, and Franchises for Movies

    An In-Depth Investigation

    By Kiersten Rule [source]

    About this dataset

    This dataset provides insight into the performance of movie franchises, with detailed information on domestic earnings, ratings, and information on each movie. Featuring data from over a decade of films released in North America from 2005 - 2018, we've collected a wealth of data to help analyze the trends that have emerged over time. From film budgets to box-office grosses, vote averages to release dates - you can explore how various studios and movies have impacted the industry by mining this database. Analyze the success of your favorite franchises or compare different plots and themes across genres! So dive in and uncover what makes a movie franchise great!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    • Compare movie franchises within the same studio – Look at trends such as average runtime or budget over time or compare one franchise to another (e.g., Marvel vs DC).

    • Analyze box office results by rating – It can be useful to compare which types of movies draw better audiences by looking at their respective box office totals per rating (e.g., R-rated vs PG-13). This can help you decide which genres do better within certain ratings systems that may be beneficial in targeting an audience with a similar demographic.

    • Use data visualization techniques – Manipulate and visualize the data set with charts and graphs to gain valuable insights into how certain movie characteristics influence overall success (e.g., use bar graphs and scatter plots to look at relationships between release year/budget/runtime etc).

    • Utilize release date analysis - This dataset gives you comprehensive information about when different movies were released, so you can use this information to analyze whether there are any benefits targeting particular months/seasons or avoiding them altogether (e.g., does Christmas offer greater success than summer for family films?).

    With these tips in mind, this dataset should provide helpful insights into an understanding of what factors contribute most significantly towards the success of both individual films and major movie franchises!

    Research Ideas

    • Analyzing the correlation between movie budget and lifetime gross earnings to determine optimum budgets for certain types of movies.
    • Tracking the average ratings and reviews over time to see if certain studios are consistently making quality films or if there is a decline in their ratings and reviews.
    • Comparing movie release dates against viewer ratings, reviews and lifetime gross revenue over time to determine which months of the year are most lucrative for releasing movies

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: MovieFranchises.csv | Column name | Description | |:-------------------|:-----------------------------------------------------------------------| | Title | The title of the movie. (String) | | Lifetime Gross | The total amount of money the movie has earned domestically. (Integer) | | Year | The year the movie was released. (Integer) | | Studio | The studio that produced the movie. (String) | | Rating | The rating of the movie e.g. PG-13, R etc. (String) | | Runtime | The length of the movie in minutes. (Integer) | | Budget | The budget of the movie. (Integer) | | ReleaseDate | The date that the movie was released. (Date) | | VoteAvg | Average rating from users. (Float) | | VoteCount | Total number of votes from users. (Integer) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Kiersten Rule.

  20. d

    Redmob Identity Graph Data for Marketing Weekly Refreshes 300M+ Unique Pairs...

    • datarade.ai
    .json
    Updated Nov 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redmob (2025). Redmob Identity Graph Data for Marketing Weekly Refreshes 300M+ Unique Pairs [Dataset]. https://datarade.ai/data-products/redmob-identity-graph-data-for-marketing-weekly-refreshes-300-redmob
    Explore at:
    .jsonAvailable download formats
    Dataset updated
    Nov 22, 2025
    Dataset authored and provided by
    Redmob
    Area covered
    United States of America
    Description

    Redmob's Identity Graph Data helps you bring fragmented user data into one unified view. Built in-house and refreshed weekly, the mobile identity graph connects online and offline identifiers.

    Designed for adtech platforms, brands, CRM, and CDP owners, Redmob enables cross-device audience tracking, deterministic identity resolution, and more precise attribution modeling across digital touchpoints.

    Use cases

    The Redmob Identity Graph is a mobile-centric database of linked identifiers that enables:

    • Cross-device matching to connect mobile, web, and offline behaviors
    • Enrich your CRM and CDP with stable IDs to improve marketing automation
    • Match mobile device IDs to emails, cookies, and offline data
    • Create lasting user profiles by connecting data from different channels
    • Enrich customer data for better segmentation and engagement

    Key benefits:

    • Connects users across devices with Redmob's in-house identity graph
    • Weekly updates keep audience profiles fresh and accurate
    • Links offline and online data to complete the user picture
    • Built for adtech with reliable, high-accuracy matches
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Fang Zhao; Robert Gaschler (2023). Data_Sheet_1_Graph schema and best graph type to compare discrete groups: Bar, line, and pie.docx [Dataset]. http://doi.org/10.3389/fpsyg.2022.991420.s001

Data_Sheet_1_Graph schema and best graph type to compare discrete groups: Bar, line, and pie.docx

Related Article
Explore at:
docxAvailable download formats
Dataset updated
Jun 4, 2023
Dataset provided by
Frontiers
Authors
Fang Zhao; Robert Gaschler
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Different graph types may differ in their suitability to support group comparisons, due to the underlying graph schemas. This study examined whether graph schemas are based on perceptual features (i.e., each graph type, e.g., bar or line graph, has its own graph schema) or common invariant structures (i.e., graph types share common schemas). Furthermore, it was of interest which graph type (bar, line, or pie) is optimal for comparing discrete groups. A switching paradigm was used in three experiments. Two graph types were examined at a time (Experiment 1: bar vs. line, Experiment 2: bar vs. pie, Experiment 3: line vs. pie). On each trial, participants received a data graph presenting the data from three groups and were to determine the numerical difference of group A and group B displayed in the graph. We scrutinized whether switching the type of graph from one trial to the next prolonged RTs. The slowing of RTs in switch trials in comparison to trials with only one graph type can indicate to what extent the graph schemas differ. As switch costs were observed in all pairings of graph types, none of the different pairs of graph types tested seems to fully share a common schema. Interestingly, there was tentative evidence for differences in switch costs among different pairings of graph types. Smaller switch costs in Experiment 1 suggested that the graph schemas of bar and line graphs overlap more strongly than those of bar graphs and pie graphs or line graphs and pie graphs. This implies that results were not in line with completely distinct schemas for different graph types either. Taken together, the pattern of results is consistent with a hierarchical view according to which a graph schema consists of parts shared for different graphs and parts that are specific for each graph type. Apart from investigating graph schemas, the study provided evidence for performance differences among graph types. We found that bar graphs yielded the fastest group comparisons compared to line graphs and pie graphs, suggesting that they are the most suitable when used to compare discrete groups.

Search
Clear search
Close search
Google apps
Main menu