Facebook
TwitterEach spreadsheet contains numerical data of figure panels as indicated. (XLSX)
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This data set consists of files for all Laman graphs (minimally rigid graphs) with at most 12 vertices and files for their Laman numbers (number of complex relaizations).
The data is computed by a combinatorial algorithm of Capco, Gallet, Grasegger, Koutschan, Lubbes and Schicho (see 10.1137/17M1118312 for a description and 10.5281/zenodo.1245506 for an implementation).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Previous researches support that graphs are relevant decision aids to tasks related to the interpretation of numerical information. Moreover, literature shows that different types of graphical information can help or harm the accuracy on decision making of accountants and financial analysts. We conducted a 4×2 mixed-design experiment to examine the effects of numerical information disclosure on financial analysts’ accuracy, and investigated the role of overconfidence in decision making. Results show that compared to text, column graph enhanced accuracy on decision making, followed by line graphs. No difference was found between table and textual disclosure. Overconfidence harmed accuracy, and both genders behaved overconfidently. Additionally, the type of disclosure (text, table, line graph and column graph) did not affect the overconfidence of individuals, providing evidence that overconfidence is a personal trait. This study makes three contributions. First, it provides evidence from a larger sample size (295) of financial analysts instead of a smaller sample size of students that graphs are relevant decision aids to tasks related to the interpretation of numerical information. Second, it uses the text as a baseline comparison to test how different ways of information disclosure (line and column graphs, and tables) can enhance understandability of information. Third, it brings an internal factor to this process: overconfidence, a personal trait that harms the decision-making process of individuals. At the end of this paper several research paths are highlighted to further study the effect of internal factors (personal traits) on financial analysts’ accuracy on decision making regarding numerical information presented in a graphical form. In addition, we offer suggestions concerning some practical implications for professional accountants, auditors, financial analysts and standard setters.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Main features of empirical graphs: Order (number of nodes), size (number of edges), and edge density (ratio between the size and the graph maximum size).
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
GraphLand benchmark is introduced in the paper GraphLand: Evaluating Graph Machine Learning Models on Diverse Industrial Data. It provides node property prediction datasets from real-world industrial applications of graph machine learning.
hm-categoriespokec-regionsweb-topicstolokers-2city-reviewsartnet-expweb-fraudhm-pricesavazu-ctrcity-roads-Mcity-roads-Ltwitch-viewsartnet-viewsweb-trafficEach dataset is provided in its own directory. Each dataset directory contains the following files:
* edgelist.csv — graph edges in the edgelist format. Node that some datasets have directed graphs and some have undirected graphs (see info.yaml for each dataset). Regardless of this, the edges are always provided in a directed format used by graph deep learning libraries PyG and DGL, that is, if a graph is undirected, then each edge appears in the edgelist twice: as (u, v) and as (v, u).
* targets.csv — node-level targets for the task, one per node. Contains NaNs if dataset has some unlabeled nodes.
* features.csv — node-level features, one feature vector per node. Node features can be either numerical or categorical (see info.yaml for each dataset for lists of numerical and categorical features). Numerical features contain NaNs if some values are unknown.
* split_masks_RL.csv — table with columns train, val, test containing masks for the RL (random low) split for the transductive setting (10%/10%/80% train/val/test random stratified split).
* split_masks_RH.csv — table with columns train, val, test containing masks for the RH (random high) split for the transductive setting (50%/25%/25% train/val/test random stratified split).
* split_masks_TH.csv — table with columns train, val, test containing masks for the TH (temporal high) split for the transductive and inductive settings (50%/25%/25% train/val/test temporal split). For the inductive setting, remove from the full graph all nodes and their incident edges from the val and test subsets to get the train graph, and remove from the full graph all nodes and their incident edges from the test subset to get the val graph. TH split is not provided for datasets which are almost static by nature (road networks) or for which there was no neccessary temporal information available: city-reviews, city-roads-M, city-roads-L, web-traffic.
* info.yaml — a yaml dictionary with dataset metadata. Contains the following keys:
* dataset_name — the name of the dataset.
* task — prediction task, one of: multiclass classification, binary classification, regression.
* metric — the recommended metric for evaluation. accuracy for multiclass classification, AP (average precision) for binary classification, R2 (R-squared, coefficient of determination) for regression.
* graph_is_directed — a boolean value indicating whether the graph is directed.
* has_unlabeled_nodes — a boolean value indicating if the dataset has unlabeled nodes.
* has_nans_in_numerical_features — a boolean indicating if the dataset has NaNs in numerical features (categorical features never have NaNs as unknown values are simply encoded as a separate category).
* target_name — the name of the target variable from the targets.csv file.
* numerical_features_names — a list of names of all numerical features from features.csv. Numerical features can have widely different scales and distributions so in practice it might be useful to apply some transformation to them, e.g., standard scaling or a quantile transformation.
* fraction_features_names — a subset of numerical_features_names, a list of names of all numerical features that have the meaning of fractions and are thus always in [0, 1] range. These features are specified because due to their range it may not be neccessary to apply transformations to them in contrast to other numerical features.
* categorical_features_names — a list of names of all categorical features from features.csv. In practice it might be useful to apply one-hot encoding to them. Each feature from features.csv is either in numerical_features_names or in categorical_features_names.
GraphLand datasets are provided under the Apache 2.0 license.
If you found GraphLand datasets useful, please cite the following work:
@article{bazhenov2025graphland,
title={{GraphLand: Evaluating Graph Machine Learning Models on Diverse Industrial Data}},
author={Bazhenov, Gleb and Platonov, Oleg and Prokhorenkova, Liudmila},
journal={arXiv preprint},
year={2025}
}
Facebook
TwitterAll numerical data for the graphs and their statistics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Characteristics of datasets used in the experiments, where |V| is the number of nodes, |E| is the number of edges, |C| is the number of communities and sizes is the range of the number of nodes in each community.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Each empirical graph is associated with an estimated p-value of being an outcome of an Erdős-Rényi, Fitness scale-free model, a Watts-Strogatz small word or a Geometric model. As in Table 1, empirical graphs are sorted according to their order.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In graph theory, a topological index is a numerical value that is in good correlation with certain physical properties of a molecule. It serves as an indicator of how a chemical structure behaves. The Shannon’s entropy describes a comparable loss of data in information transmission networks. It has found use in the field of information theory. Inspired by the concept of Shannon’s entropy, we have calculated some topological descriptors for fractal and Cayley-type dendrimer trees. We also find the entropy that is predicted by these indices.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Numerical edge analysis of population graphs according to similarity measure.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AbstractThe safe and effective application of human pluripotent stem cells (hPSCs) in research and regenerative medicine requires precise control over pluripotency and cell fate. Pluripotency is characterized by high levels of histone acetylation and aerobic glycolysis, while differentiation is associated with metabolic shifts and reduced histone acetylation. These transitions are driven, in part, by the availability of metabolic substrates that influence epigenetic regulation. A central enzyme in this process is pyruvate dehydrogenase (PDH), which converts glycolytic pyruvate into acetyl coenzyme A (Ac-CoA), the essential donor for histone acetylation.Here, we investigate how PDH activity regulates histone acetylation and pluripotency maintenance under physiologically relevant oxygen conditions (5% and 21% O₂), in response to FGF2 signaling and changes in reactive oxygen species (ROS). We show that active PDH promotes global histone H3 acetylation and upregulates the expression of the key pluripotency factor NANOG, specifically under 5% O₂. Mechanistically, we identify a novel FGF2–MEK1/2–ERK1/2–ROS axis that modulates PDH activity via redox-dependent regulation. Notably, this effect is oxygen-sensitive and absent at atmospheric oxygen levels (21% O₂).Our findings position PDH as a redox-sensitive metabolic switch that connects energy metabolism with epigenetic control of pluripotency by regulating Ac-CoA availability. This work highlights the critical role of oxygen tension, ROS homeostasis, and growth factor signaling in shaping the metabolic–epigenetic landscape of hPSCs, with implications for optimizing stem cell culture and differentiation protocols.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
There are 180 large-scale high-density (97-99%) instances for Max-Cut problems with Q matrix from 1000 by 1000 to 90000 by 90000. For 90000 by 90000, the file are broken to multiple 2GB pieces such as MC90000_*.txt.gz_a, ...MC90000_*.txt.gz.d . To recover the large data file after you download the pieces, use copy /b file1 + file2 + file3 + file4 filetogether for example, for MC90000_1.txt data copy /b MC90000_1.txt.gz_a +....+ MC90000_1.txt.gz_e MC90000_1.txt.gz gunzip MC90000_1.txt.gz There are three different types of weights on the instances. The MCxx_yy_a.txt.xz instance has 1 and -1 weight. The MCxx_yy_b.txt.xz instance has random value between -10 and 10. The MCxx_yy_c.txt.xz instance has random value between -1000 and 1000. All data files are compressed with XZ tool. For each instance, there is a text-file in the following format (rudy-output format): n m h_1 t_1 c_{h_1,t_1} h_2 t_2 c_{h_2,t_2} ... h_n t_n c_{h_n,t_n} where n is the number of nodes, m the number of edges and for each edge, h_i and t_i are the end-nodes and c_{h_i,t_i} the weight. Nodes are numbered from 1 up to n. All instances are generated as complete graph
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The graph shows the number of articles published in the discipline of ^.
Facebook
TwitterNumerical data underlying graphs and summary statistics presented in the main figures.
Facebook
TwitterSupplementary Tables and Numerical Source for Graphs "Distinct Populations of Lung Capillary Endothelial Cells and Their Functional Significance"
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Using the User Manual included in the research paper, and the Graph Design Example file as a reference, the user enters or saves all the vertices and edges needed to specify the model of the system topography.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset information
A road network of California. Intersections and endpoints are represented by
nodes and the roads connecting these intersections or road endpoints are
represented by undirected edges.
Dataset statistics
Nodes 1965206
Edges 5533214
Nodes in largest WCC 1957027 (0.996)
Edges in largest WCC 5520776 (0.998)
Nodes in largest SCC 1957027 (0.996)
Edges in largest SCC 5520776 (0.998)
Average clustering coefficient 0.0464
Number of triangles 120676
Fraction of closed triangles 0.06039
Diameter (longest shortest path) 850
90-percentile effective diameter 5e+002
Source (citation)
J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. Community Structure in Large
Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters.
arXiv.org:0810.1355, 2008.
Files
File Description
roadNet-CA.txt.gz California road network
Dataset information
This is a road network of Pennsylvania. Intersections and endpoints are
represented by nodes, and the roads connecting these intersections or endpoints
are represented by undirected edges.
Dataset statistics
Nodes 1088092
Edges 3083796
Nodes in largest WCC 1087562 (1.000)
Edges in largest WCC 3083028 (1.000)
Nodes in largest SCC 1087562 (1.000)
Edges in largest SCC 3083028 (1.000)
Average clustering coefficient 0.0465
Number of triangles 67150
Fraction of closed triangles 0.05941
Diameter (longest shortest path) 782
90-percentile effective diameter 5.3e+002
Source (citation)
J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. Community Structure in Large
Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters.
arXiv.org:0810.1355, 2008.
Files
File Description
roadNet-PA.txt.gz Pennsylvania road network
...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains in-air hand-written numbers and shapes data used in the paper:B. Alwaely and C. Abhayaratne, "Graph Spectral Domain Feature Learning With Application to in-Air Hand-Drawn Number and Shape Recognition," in IEEE Access, vol. 7, pp. 159661-159673, 2019, doi: 10.1109/ACCESS.2019.2950643.The dataset contains the following:-Readme.txt- InAirNumberShapeDataset.zip containing-Number Folder (With 2 sub folders for Matlab and Excel)-Shapes Folder (With 2 sub folders for Matlab and Excel)The datasets include the in-air drawn number and shape hand movement path captured by a Kinect sensor. The number sub dataset includes 500 instances per each number 0 to 9, resulting in a total of 5000 number data instances. Similarly, the shape sub dataset also includes 500 instances per each shape for 10 different arbitrary 2D shapes, resulting in a total of 5000 shape instances. The dataset provides X, Y, Z coordinates of the hand movement path data in Matlab (M-file) and Excel formats and their corresponding labels.This dataset creation has received The University of Sheffield ethics approval under application #023005 granted on 19/10/2018.
Facebook
Twitterhttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Yearly citation counts for the publication titled "Conflict-Free Connection Number and Size of Graphs".
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The number of messages for graph computation
Facebook
TwitterEach spreadsheet contains numerical data of figure panels as indicated. (XLSX)