GOOD is a systematic graph OOD benchmark, which provides carefully designed data environments for distribution shifts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Implementation of visibility graphs algorithm in R language. These scripts generate visibility graphs from series built in RStudio or imported into RStudio; Plot the series, the series histogram, the degree distribution of the graphs generated from these series; They determine the fit of the distribution curve in a log-log graph; Calculates the fundamental metrics of complex networks for the visibility graphs associated with each series.
A tracer breakthrough curve (BTC) for each sampling station is the ultimate goal of every quantitative hydrologic tracing study, and dataset size can critically affect the BTC. Groundwater-tracing data obtained using in situ automatic sampling or detection devices may result in very high-density data sets. Data-dense tracer BTCs obtained using in situ devices and stored in dataloggers can result in visually cluttered overlapping data points. The relatively large amounts of data detected by high-frequency settings available on in situ devices and stored in dataloggers ensure that important tracer BTC features, such as data peaks, are not missed. Alternatively, such dense datasets can also be difficult to interpret. Even more difficult, is the application of such dense data sets in solute-transport models that may not be able to adequately reproduce tracer BTC shapes due to the overwhelming mass of data. One solution to the difficulties associated with analyzing, interpreting, and modeling dense data sets is the selective removal of blocks of the data from the total dataset. Although it is possible to arrange to skip blocks of tracer BTC data in a periodic sense (data decimation) so as to lessen the size and density of the dataset, skipping or deleting blocks of data also may result in missing the important features that the high-frequency detection setting efforts were intended to detect. Rather than removing, reducing, or reformulating data overlap, signal filtering and smoothing may be utilized but smoothing errors (e.g., averaging errors, outliers, and potential time shifts) need to be considered. Appropriate probability distributions to tracer BTCs may be used to describe typical tracer BTC shapes, which usually include long tails. Recognizing appropriate probability distributions applicable to tracer BTCs can help in understanding some aspects of the tracer migration. This dataset is associated with the following publications: Field, M. Tracer-Test Results for the Central Chemical Superfund Site, Hagerstown, Md. May 2014 -- December 2015. U.S. Environmental Protection Agency, Washington, DC, USA, 2017. Field, M. On Tracer Breakthrough Curve Dataset Size, Shape, and Statistical Distribution. ADVANCES IN WATER RESOURCES. Elsevier Science Ltd, New York, NY, USA, 141: 1-19, (2020).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.
This dataset comprises temporal dynamic graph sequences generated from power grid simulations focused on grid reconfiguration to enhance resilience. The simulations model failure propagation under varying conditions, with nodes assigned distinct failure probabilities. For each time step, the dataset captures the evolution of node states (functional or failed) and features critical to grid operations, such as pv_output, load_profile, load_dispatch, dg_output, loss, and voltage. Node types include sources, normal loads, and nodes with specific equipment like PVs, micro turbines, or shunt capacitors. The dataset is structured to support the training of dynamic graph neural networks, facilitating research on node feature prediction and edge dynamics under failure scenarios. Three distinct configurations are included, providing a robust foundation for modeling power grid resilience.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We list the distribution of the 15 main hub gene degrees in the two graph prototypes. Here, is the number of genes with neighbors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The scale and complexity of relational data in critical domains like chemistry, neuroscience, and social media has ignited interest in graph neural networks as performant, expressive, flexible frameworks for solving problems specified over graphs. However, this performance comes at the cost of interpretability: the behavior of a typical neural network is, at best, a mystery. Graph grammars, in contrast, provide a symbolic, discrete, rule-based formalism for describing transformations between graphs. While profoundly interpretable, they are mired in the inductive biases and restrictive assumptions common to many traditional approaches to graph modeling.
This dissertation tries to diminish the discrepancy between graph neural networks and graph grammars. The first contribution introduces Dynamic Vertex Replacement Grammars as a way of modeling temporal graph datasets with graph grammars. The second contribution proposes an analytically-invertible normalizing flow network that learns prototypical probability distributions as intrinsic explanations for its behavior. The third contribution shows how the attention mechanism in graph neural networks induces grammars that can act as generative post hoc explainers.
This supports the thesis that discrete rules and continuous distributions are jointly critical to the future of machine learning.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Portugal - Income distribution was 5.20 in December of 2024, according to the EUROSTAT. The income distribution ratio considers the total income received by the 20 % of the population with the highest income to that received by the 20 % of the population with the lowest income.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The file presenting a pore file distribution on a SiC ceramic structure made by mercury intrusion porosimetry method
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This zip contains three CSV files and one folder. This dataset contains information for the recent ten distributions.
Here is the arxiv version of our paper: https://arxiv.org/abs/2101.08729.
Here is the portal link: https://sites.google.com/view/rima-hazra/swnet
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset 4k processed for ML task for graph-based models. The Dataset is divided into training, validation, in-distribution and out-of-distribution testing subsets. Certain subsets are split using ZIP utility due to uploading restrictions.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In this MANUAL FOR VISIBILITY GRAPHS MODELING USING R-STUDIO We will first present basic notions that will allow the understanding of the mapping process, then we'll show the computational idea. Finally, let's work with the R scripts inside the RStudio, exploring pseudo-random series, Brownian motion series, periodic series, series of fibonacci and series of audio signals. We'll show you: 1) how to generate time series in RS Studio and later turn them into visibility graphs. 2) how to import time series allocated in a directory, turning them into visibility graphs. 3) how to visualize networks using three types of algorithms, followed by calculation and visualization of the main properties of complex networks. About the codes included The 3 codes included generates visibility graphs of series generated by RStudio functions. This code also calculates some metrics for complex networks, generates the graph plot and its degree distribution, shows the plot of the series and its histogram.
https://www.rioxx.net/licenses/all-rights-reserved/https://www.rioxx.net/licenses/all-rights-reserved/
Data for graph Illus 4.22. Date distribution of forms with a date range of less than 200 years shown by percentage of rim equivalents.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Three temporal graph datasets for node classification under distribution shift.
DBLP-Easy and DBLP-Hard are citation graph datasets. PharmaBio is a collaboration graph dataset.
Vertices are scientific publications, edges are either citations (DBLP) or at-least-one-common-author relationships (PharmaBio).
The task is to classify the vertices of the graph into the respective conference/journal venues (DBLP) or journal categories (PharmaBio). In the DBLP datasets, new classes may appear over time.
Each dataset follows the structure:
adjlist.txt -- the graph structure encoded as adjacency lists: in each row, the first entry is the source vertex, the remaining entries are adjacent vertices
X.npy -- numpy serialized format for node features indexed by node id corresponding to adjlist.txt
y.npy -- numpy serialized format for node labels indexed by node id corresponding to adjlist.txt
t.npy -- numpy serialized format for time steps indexed by node id corresponding to adjlist.txt
A paper describing our incremental training and evaluation framework is published in IJCNN 2021 (Pre-print on arXiv: https://arxiv.org/abs/2006.14422).
If you use these datasets in your research, please cite the corresponding paper:
@inproceedings{galke2021lifelong, author={Galke, Lukas and Franke, Benedikt and Zielke, Tobias and Scherp, Ansgar}, booktitle={2021 International Joint Conference on Neural Networks (IJCNN)}, title={Lifelong Learning of Graph Neural Networks for Open-World Node Classification}, year={2021}, volume={}, number={}, pages={1-8}, doi={10.1109/IJCNN52387.2021.9533412} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Austria - Income distribution was 4.34 in December of 2024, according to the EUROSTAT. The income distribution ratio considers the total income received by the 20 % of the population with the highest income to that received by the 20 % of the population with the lowest income.
The Graph extension for CKAN adds the ability to visualize data resources as graphs, providing users with a more intuitive understanding of the information contained within datasets. It currently supports temporal and categorical graph types, enabling the creation of count-based visualizations over time or across different categories. While the current version is primarily designed for use with an Elasticsearch backend within the Natural History Museum's infrastructure, it is built to be extensible for broader applicability. Key Features: Temporal Graphs: Generates line graphs that display counts of data points over time, based on a designated date field within the resource. This allows to visualize trends and patterns dynamically. Categorical Graphs: Creates bar charts that show the distribution of counts for various values found within a specified field in a resource, making it easier to understand data groupings. Extensible Backend Architecture: Designed to support multiple backend data storage options, with Elasticsearch currently implemented, paving the way for future integration with other systems like PostgreSQL. Template Customization: Includes a template (templates/graph/view.html) that can be extended to override or add custom content to the graph view, giving full control over the visualization design. Configuration Options: Backend selection through the .ini configuration file. Users can choose between Elasticsearch or SQL, allowing administrators to align the extension with their specific requirements. Technical Integration: The Graph extension integrates with CKAN by adding a new view option to resources. Once enabled, the graph view will appear as an available option alongside existing resource viewers. The configuration requires modifying the CKAN .ini file to add 'graph' to the list of enabled plugins and setting the desired backend. The template templates/graph/view.html allows for full customization of the view. Benefits & Impact: The Graph extension enhances the usability of CKAN-managed datasets by providing interactive visualizations of data. Temporal graphs help users identify time-based trends, while categorical graphs illustrate data distribution. The extensible architecture ensures that the extension can be adapted to different data storage systems, improving its versatility. By providing a graphical representation of data, this extension makes it easier to understand complex information, benefiting both data providers and consumers.
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Employed full time: Wage and salary workers: Transportation, storage, and distribution managers occupations: 16 years and over (LEU0254472700A) from 2000 to 2024 about distributive, management, occupation, full-time, salaries, workers, transportation, 16 years +, wages, employment, and USA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Exponents for the degree probability distribution, where γHVG is the exponent for an exponential fit of the HVG distribution and γVG for a power-law fit of the VG distribution.
The graph shows the distribution of Hispanic immigrants in the United States in 2018, by industry and region of birth. In 2018, about 17.24 percent of Mexican immigrants were working in construction.
GOOD is a systematic graph OOD benchmark, which provides carefully designed data environments for distribution shifts.