Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data we used to evaluate Louvain Method in the study Benchmarking Graph Databases on the Problem of Community Detection. These data werw synthetically generated using the LFR-Benchmark (3rd link). There are two type of files, networkX.dat and communityX.dat. The networkX.dat file contains the list of edges (nodes are labelled from 1 to the number of nodes; the edges are ordered and repeated twice, i.e. source-target and target-source). The first four lines of the networkX.dat file list the parameters we used to generate the data. The communityX.dat file contains a list of the nodes and their membership (memberships are labelled by integer numbers >=1). Note X correspond to the number of nodes each dataset contains.
Facebook
TwitterAs post hoc explanations are increasingly used to understand the behavior of Graph Neural Networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations. However, assessing the quality of GNN explanations is challenging as existing graph datasets have no or unreliable ground-truth explanations for a given task. Here, we introduce a synthetic graph data generator, ShapeGGen, which can generate a variety of benchmark datasets (e.g., varying graph sizes, degree distributions, homophilic vs. heterophilic graphs) accompanied by ground-truth explanations. Further, the flexibility to generate diverse synthetic datasets and corresponding ground-truth explanations allows us to mimic the data generated by various real-world applications. We include ShapeGGen and additional XAI-ready real-world graph datasets into an open-source graph explainability library, GraphXAI. In addition, GraphXAI provides a broader ecosystem of data loaders, data processing functions, synthetic and real-world graph datasets with ground-truth explanations, visualizers, GNN model implementations, and a set of evaluation metrics to benchmark the performance of any given GNN explainer.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets of synthetic task flow graphs were generated to evaluate the performance and scalability of an optimal task allocation approach for applications of various structures and sizes in an environment following the edge/hub/cloud paradigm. The system under study comprised an edge device (e.g., a single-board computer attached to an unmanned aerial vehicle (UAV)) interacting with a hub device (e.g., a laptop), which in turn communicated with a more computationally capable cloud server. The objective was the minimization of either overall latency or overall energy consumption, under memory, storage, energy, and task precedence constraints. We considered that a percentage of the tasks required fixed allocation on the edge or hub device. We generated 18 task flow graphs of parallel, serial, and mixed (a combination of parallel and serial) structure with 10, 100, and 1000 nodes, and various in/out degrees, utilizing the Task Graphs For Free (TGFF) random task graph generator [1],[2]. Additional task parameters (e.g., execution time, power consumption, memory, storage, output data size) were included post-generation, using representative random values. More details are provided in README.txt and in [3]. Note: These datasets are released under a Creative Commons Attribution license. If you utilize these datasets in your work, please cite us using the corresponding Zenodo DOI https://doi.org/10.5281/zenodo.10654551. References:[1] R. P. Dick, D. L. Rhodes, and W. Wolf, "TGFF: Task graphs for free," Proceedings of the Sixth International Workshop on Hardware/Software Codesign (CODES/CASHE), 1998, pp. 97-101, doi: 10.1109/HSC.1998.666245.[2] R. P. Dick, D. L. Rhodes, and K. Vallerio, "TGFF," https://robertdick.org/projects/tgff/.[3] A. Kouloumpris, G. L. Stavrinides, M. K. Michael, and T. Theocharides, "An optimization framework for task allocation in the edge/hub/cloud paradigm," Future Generation Computer Systems, vol. 155, pp. 354-366, Jun. 2024, doi: 10.1016/j.future.2024.02.005.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variant data used during semi-synthetic data generation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets of synthetic task graphs were generated to evaluate the performance and scalability of a multi-objective task allocation approach for workflow applications of various structures and sizes in a system based on the edge-hub-cloud paradigm. The targeted architecture comprised an edge device (e.g., a single-board computer attached to an unmanned aerial vehicle (UAV)) interacting with a hub device (e.g., a laptop), which in turn communicated with a more computationally capable cloud server. The objectives were the maximization of the overall reliability and the minimization of the overall latency of the application, under memory, storage, energy, and task precedence constraints. We considered that a percentage of the tasks required fixed allocation on the edge or hub device. Each task had a different vulnerability factor (i.e., probability of failure) on each device. We generated nine task graphs of serial, parallel, and mixed (a combination of serial and parallel) structure with 10, 100, and 1000 nodes, utilizing the Task Graphs For Free (TGFF) random task graph generator [1]. Additional task parameters (e.g., execution time, power consumption, vulnerability factor, memory, storage, output data size) were included post-generation, using representative random values. More details are provided in README.txt. Note: These datasets are released under a Creative Commons Attribution license. If you utilize these datasets in your work, please cite us using the corresponding Zenodo DOI https://doi.org/10.5281/zenodo.10357101. References: [1] R. P. Dick, D. L. Rhodes and W. Wolf, "TGFF: Task graphs for free," Proceedings of the Sixth International Workshop on Hardware/Software Codesign (CODES/CASHE'98), Seattle, WA, USA, 1998, pp. 97-101, doi: 10.1109/HSC.1998.666245.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Outline
This dataset is originally created for the Knowledge Graph Reasoning Challenge for Social Issues (KGRC4SI)
Video data that simulates daily life actions in a virtual space from Scenario Data.
Knowledge graphs, and transcriptions of the Video Data content ("who" did what "action" with what "object," when and where, and the resulting "state" or "position" of the object).
Knowledge Graph Embedding Data are created for reasoning based on machine learning
This data is open to the public as open data
Details
Videos
mp4 format
203 action scenarios
For each scenario, there is a character rear view (file name ending in 0), an indoor camera switching view (file name ending in 1), and a fixed camera view placed in each corner of the room (file name ending in 2-5). Also, for each action scenario, data was generated for a minimum of 1 to a maximum of 7 patterns with different room layouts (scenes). A total of 1,218 videos
Videos with slowly moving characters simulate the movements of elderly people.
Knowledge Graphs
RDF format
203 knowledge graphs corresponding to the videos
Includes schema and location supplement information
The schema is described below
SPARQL endpoints and query examples are available
Script Data
txt format
Data provided to VirtualHome2KG to generate videos and knowledge graphs
Includes the action title and a brief description in text format.
Embedding
Embedding Vectors in TransE, ComplEx, and RotatE. Created with DGL-KE (https://dglke.dgl.ai/doc/)
Embedding Vectors created with jRDF2vec (https://github.com/dwslab/jRDF2Vec).
Specification of Ontology
Please refer to the specification for descriptions of all classes, instances, and properties: https://aistairc.github.io/VirtualHome2KG/vh2kg_ontology.htm
Related Resources
KGRC4SI Final Presentations with automatic English subtitles (YouTube)
VirtualHome2KG (Software)
VirtualHome-AIST (Unity)
VirtualHome-AIST (Python API)
Visualization Tool (Software)
Script Editor (Software)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the repository for ISWC 2023 Resource Track submission for Text2KGBench: Benchmark for Ontology-Driven Knowledge Graph Generation from Text. Text2KGBench is a benchmark to evaluate the capabilities of language models to generate KGs from natural language text guided by an ontology. Given an input ontology and a set of sentences, the task is to extract facts from the text while complying with the given ontology (concepts, relations, domain/range constraints) and being faithful to the input sentences.
It contains two datasets (i) Wikidata-TekGen with 10 ontologies and 13,474 sentences and (ii) DBpedia-WebNLG with 19 ontologies and 4,860 sentences.
An example
An example test sentence:
Test Sentence:
{"id": "ont_music_test_n", "sent": "\"The Loco-Motion\" is a 1962 pop song written by
American songwriters Gerry Goffin and Carole King."}
An example of ontology:
Ontology: Music Ontology
Expected Output:
{
"id": "ont_k_music_test_n",
"sent": "\"The Loco-Motion\" is a 1962 pop song written by American songwriters Gerry Goffin and Carole King.",
"triples": [
{
"sub": "The Loco-Motion",
"rel": "publication date",
"obj": "01 January 1962"
},{
"sub": "The Loco-Motion",
"rel": "lyrics by",
"obj": "Gerry Goffin"
},{
"sub": "The Loco-Motion",
"rel": "lyrics by",
"obj": "Carole King"
},]
}
The data is released under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY 4.0) License.
The structure of the repo is as the following.
benchmark the code used to generate the benchmarkevaluation evaluation scripts for calculating the resultsThis benchmark contains data derived from the TekGen corpus (part of the KELM corpus) [1] released under CC BY-SA 2.0 license and WebNLG 3.0 corpus [2] released under CC BY-NC-SA 4.0 license.
[1] Oshin Agarwal, Heming Ge, Siamak Shakeri, and Rami Al-Rfou. 2021. Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3554–3565, Online. Association for Computational Linguistics.
[2] Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini. 2017. Creating Training Corpora for NLG Micro-Planners. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The unfortunate lack of a widely used graph benchmark suite forces each research publication to create its own evaluation methodology, and this often results in mistakes or unnecessary differences. Common serious mistakes we have observed include: using trivially small input graphs, using only a single input graph topology, or using low-performance implementations as baselines. These methodological issues make it difficult for good ideas to stand out and cloud the reasoning behind why these ideas are beneficial.
In order for the research community to make progress on accelerating graph processing, it is important to be able to properly and reliably compare results. We created the GAP Benchmark Suite to standardize evaluations in order to alleviate the methodological issues we observed. Through standardization, we hope to not only make results easier to compare, but to also prevent common evaluation mistakes. We provide both a benchmark specification to standardize the methodology and a high-performance reference implementation to be used as a baseline. Our benchmark was co-designed with our workload characterization, and it has undergone multiple revisions guided by community feedback.
GAP Benchmark matrices: Scott Beamer, Krste Asanovic', and David Patterson. as described in "The GAP Benchmark Suite", https://arxiv.org/abs/1508.03619 .
(1) GAP-twitter (|V|=61.6M, |E|=1,468.4M, directed) is an example of a social network topology [18]. This particular crawl of Twitter has been commonly used by researchers and thus eases comparisons with prior work. By virtue of it coming from real-world data, it has interesting irregularities and the skew in its degree distribution can be a challenge for some implementations.
[18] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon.
What is Twitter, a social network or a news media? International
World Wide Web Conference (WWW), 2010.
A permuted version of this matrix appears as SNAP/twitter7 in
the SuiteSparse Matrix Collection.
(2) GAP-web (|V|=50.6M, |E|=1,949.4M, directed) is a web-crawl of the .sk domain (sk-2005) [9]. Despite its large size, it exhibits substantial locality due to its topology and high average degree.
The matrix comes from the Laboratory for Web Algorithmics (LAW), Universita
degli Studi di Milano, http://law.di.unimi.it/index.php.
The pattern of this GAP-web matrix also appears as LAW/sk-2005, in the
SuiteSparse Matrix Collection.
(3) GAP-road (|V|=23.9M, |E|=58.3M, directed) is the distances of all of the roads in the USA [10]. Although it is substantially smaller than the rest of the graphs, it has a high diameter which can cause some synchronous implementations to have long runtimes.
[10] 9th DIMACS implementation challenge -- shortest paths.
http://www.dis.uniroma1.it/challenge9/, 2006.
The pattern of the GAP-road matrix also appears as DIMACS10/road_usa
in the SuiteSparse Matrix Collection.
(4) GAP-kron (|V|=134.2M, |E|=2,111.6M, undirected) uses the Kronecker synthetic graph generator [19] with the same parameters as Graph 500 (A=0.57, B=C=0.19, D=0.05) [14]. It has been used frequently in research due to Graph 500, so it also provides continuity with prior work.
[19] Jurij Leskovec, Deepayan Chakrabarti, Jon Kleinberg, and
Christos Faloutsos. Realistic, mathematically tractable graph
generation and evolution, using Kronecker multiplication.
European Conference on Principles and Practice of Knowledge
Discovery in Databases, 2005.
[14] Graph500 benchmark. www.graph500.org.
(5) GAP-urand (|V|=134.2M, |E|=2,147.4M, undirected) is synthetically generated by the Erdos– Reyni model (Uniform Random) [11]. With respect to locality, it represents the worst case as every vertex has equal probability of being a neighbor of every other vertex. When contrasted with the similarly sized kron graph, it demonstrates the impact of kron’s scale-free property.
[11] Paul Erdos and Alfred Reyni. On random graphs. I.
Publicationes Mathematicae, 6:290–297, 1959.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Data Description
We release the synthetic data generated using the method described in the paper Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models (ACL 2024 Findings). The external knowledge we use is based on external knowledge graphs.
Generated Datasets
The original train/validation/test data, and the generated synthetic training data are listed as follows. For each dataset, we generate 5000 synthetic… See the full description on the dataset page: https://huggingface.co/datasets/ritaranx/clinical-synthetic-text-kg.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Synthetic data generating parameters. The table summarizes the generating parameters for synthetic networks showing the corresponding symbol, name and range after the application of the constraints in Section e.2.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntelliGraphs is a collection of datasets for benchmarking Knowledge Graph Generation models. It consists of three synthetic datasets (syn-paths, syn-tipr, syn-types) and two real-world datasets (wd-movies, wd-articles). There is also a Python package available that loads these datasets and verifies new graphs using semantics that was pre-defined for each dataset. It can also be used as a testbed for developing new generative models.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this paper, we introduce an R package EATME, which is known as Exponentially weighted moving average (EWMA) control chart with Adjustments To Measurement Error. The main purpose of this package is to correct for measurement error effects in continuous or binary random variables and develop the corrected control charts based on the EWMA statistic. In addition, the corrected control charts can detect out-of-control process accurately. The package contains a function to generate synthetic data and includes functions to determine the reasonable coefficient of control limit as well as estimate average run length. Moreover, for the visualization, we also provide the control charts to show the monitoring of in-control and out-of-control process. Finally, the functions in this package are clearly demonstrated, and numerical studies show the validity of the package.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Research Objective is to design online algorithms for graph filter design over expanding graphs under conditions of known and unknown connectivity. The data-sets used in this paper are available online. Code for generating synthetic data is included. In the folder Recsys_new, the experimental setup and online algorithms for movie rating prediction for Movielens100k is provided. In the folder Stochastic_Synthetic_New, the experimental setup and online algorithms for signal interpolation for synthetic expanding graphs is included. In Stochastic_covid, the code for Covid case count prediction over a growing city network is provided.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Artifacts for the paper titled Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?.
This artifact repository contains 9 compressed folders, as follows:
ID File Name Description
1 syn_circa.zip CIRCA10, and CIRCA50 datasets for Causal Discovery
2 syn_rcd.zip RCD10, and RCD50 datasets for Causal Discovery
3 syn_causil.zip CausIL10, and CausIL50 datasets for Causal Discovery
4 rca_circa.zip CIRCA10, and CIRCA50 datasets for RCA
5 rca_rcd.zip RCD10, and RCD50 datasets for RCA
6 online-boutique.zip Online Boutique dataset for RCA
7 sock-shop-1.zip Sock Shop 1 dataset for RCA
8 sock-shop-2.zip Sock Shop 2 dataset for RCA
9 train-ticket.zip Train Ticket dataset for RCA
Each zip file contains the generated/collected data from the corresponding data generator or microservice benchmark systems (e.g., online-boutique.zip contains metrics data collected from the Online Boutique system).
Details about the generation of our datasets
We use three different synthetic data generators from three previous RCA studies [15, 25, 28] to create the synthetic datasets: CIRCA, RCD, and CausIL data generators. Their mechanisms are as follows:1. CIRCA datagenerator [28] generates a random causal directed acyclic graph (DAG) based on a given number of nodes and edges. From this DAG, time series data for each node is generated using a vector auto-regression (VAR) model. A fault is injected into a node by altering the noise term in the VAR model for two timestamps. 2. RCD data generator [25] uses the pyAgrum package [3] to generate a random DAG based on a given number of nodes, subsequently generating discrete time series data for each node, with values ranging from 0 to 5. A fault is introduced into a node by changing its conditional probability distribution.3. CausIL data generator [15] generates causal graphs and time series data that simulate the behavior of microservice systems. It first constructs a DAG of services and metrics based on domain knowledge, then generates metric data for each node of the DAG using regressors trained on real metrics data. Unlike the CIRCA and RCD data generators, the CausIL data generator does not have the capability to inject faults.To create our synthetic datasets, we first generate 10 DAGs whose nodes range from 10 to 50 for each of the synthetic data generators. Next, we generate fault-free datasets using these DAGs with different seedings, resulting in 100 cases for the CIRCA and RCD generators and 10 cases for the CausIL generator. We then create faulty datasets by introducing ten faults into each DAG and generating the corresponding faulty data, yielding 100 cases for the CIRCA and RCD data generators. The fault-free datasets (e.g. syn_rcd, syn_circa) are used to evaluate causal discovery methods, while the faulty datasets (e.g. rca_rcd, rca_circa) are used to assess RCA methods.
We deploy three popular benchmark microservice systems: Sock Shop [6], Online Boutique [4], and Train Ticket [8], on a four-node Kubernetes cluster hosted by AWS. Next, we use the Istio service mesh [2] with Prometheus [5] and cAdvisor [1] to monitor and collect resource-level and service-level metrics of all services, as in previous works [ 25 , 39, 59 ]. To generate traffic, we use the load generators provided by these systems and customise them to explore all services with 100 to 200 users concurrently. We then introduce five common faults (CPU hog, memory leak, disk IO stress, network delay, and packet loss) into five different services within each system. Finally, we collect metrics data before and after the fault injection operation. An overview of our setup is presented in the Figure below.
Code
The code to reproduce the experimental results in the paper is available at https://github.com/phamquiluan/RCAEval.
References
As in our paper.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides synthetically generated financial time series data, presented as OHLCV (Open-High-Low-Close-Volume) candlestick charts. A key feature of this dataset is the inclusion of technical analysis annotations (labels) meticulously created by a human analyst for each chart.
The primary goal is to offer a resource for training and evaluating machine learning models focused on automated technical analysis and chart pattern recognition. By providing synthetic data with high-quality human labels, this dataset aims to facilitate research and development in areas like algorithmic trading and financial visualization analysis.
This is an evolving dataset. It represents the initial phase of a larger labeling effort, and future updates are planned to incorporate a greater number and variety of labeled chart patterns.
The dataset is provided entirely as a collection of JSON files. Each file represents a single 300-candle chart window and contains:
metadata: Contains basic information related to the generation of the file (e.g., generation timestamp, version).ohlcv_data: A sequence of 300 data points. Each point is a dictionary representing one time candle and includes:
time: Timestamp string (ISO 8601 format). Note: These timestamps maintain realistic intra-day time progression (hours, minutes), but the specific dates (Day, Month, Year) are entirely synthetic and do not align with real-world calendar dates.open, high, low, close: Numerical values representing the candle's price range. Note: These values are synthetic and are not tied to any real financial instrument's price.volume: A numerical value representing activity during the candle's period. Note: This is also a synthetic value.labels: A dictionary containing the human-provided technical analysis annotations for the corresponding chart window:
horizontal_lines: A list of structures, each containing a price key. These typically denote significant horizontal levels identified by the labeler, such as support or resistance.ray_lines: A list of structures, each defining a line segment via start_date, start_price, end_date, and end_price. These are used to represent patterns like trendlines, channel boundaries, or other linear formations observed by the labeler.The dataset features synthetically generated candlestick patterns. The generation process focuses on creating structurally plausible chart sequences. Human analysts then carefully review these sequences and apply relevant technical analysis labels (support, resistance, trendlines).
While the patterns may resemble those seen in financial markets, the underlying numerical data (price, volume, and the associated timestamps) is artificial and intentionally detached from any real-world financial data. Users should focus on the relative structure of the candles and the associated human-provided labels, rather than interpreting the absolute values as representative of any specific market or time.
This dataset is made possible through ongoing human labeling efforts and custom data generation software.
Facebook
TwitterOverView
This dataset is a synthetic dataset created using the Scalable Data Generation (SDG) framework.It is structured for use with a thinking model, and the input and output form a set of questions and answers.
Data Generation Pipeline
Question Generation
Model: Qwen/Qwen3-30B-A3B-Instruct-2507
Assume the role of an academic graph expert and generate Ph.D.-level questions.
Reasoning Process Generation
Model: openai/gpt-oss-120b
Reassign the role of… See the full description on the dataset page: https://huggingface.co/datasets/ikedachin/difficult_problem_dataset_v3.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary of the arguments of the functions in the package EATME.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This is a synthetic reasoning dataset generated from the PrimeKG biomedical knowledge graph. It contains verifiable reasoning traces generated using the approach outlined in Synthetic CoT Reasoning Trace Generation from Knowledge Graphs. The synthetic chain-of-thought data is generated procedurally using program synthesis and logic programming which is able to produce vast quantities of verifiable forward reasoning traces with minimal human oversight. The benchmark is intended to be used to… See the full description on the dataset page: https://huggingface.co/datasets/extrasensory/reasoning-biochem.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains complementary data to the paper "The Least Cost Directed Perfect Awareness Problem: Complexity, Algorithms and Computations" [1]. Here, we make available two sets of instances of the combinatorial optimization problem studied in that paper, which deals with the spread of information on social networks. We also provide the best known solutions and bounds obtained through computational experiments for each instance.
The first input set includes 300 synthetic instances composed of graphs that resemble real-world social networks. These graphs were produced with a generator proposed in [2]. The second set consists of 14 instances built from graphs obtained by crawling Twitter [3].
The directories "synthetic_instances" and "twitter_instances" contain files that describe both sets of instances, all of which follow the format: the first two lines correspond to:
where
where
where and
The directories "solutions_for_synthetic_instances" and "solutions_for_twitter_instances" contain files that describe the best known solutions for both sets of instances, all of which follow the format: the first line corresponds to:
where is the number of vertices in the solution. Each of the next lines contains:
where
where
Lastly, two files, namely, "bounds_for_synthetic_instances.csv" and "bounds_for_twitter_instances.csv", enumerate the values of the best known lower and upper bounds for both sets of instances.
This work was supported by grants from Santander Bank, Brazil, Brazilian National Council for Scientific and Technological Development (CNPq), Brazil, São Paulo Research Foundation (FAPESP), Brazil.
Caveat: the opinions, hypotheses and conclusions or recommendations expressed in this material are the responsibility of the authors and do not necessarily reflect the views of Santander, CNPq, or FAPESP.
References
[1] F. C. Pereira, P. J. de Rezende. The Least Cost Directed Perfect Awareness Problem: Complexity, Algorithms and Computations. Submitted. 2023.
[2] B. Bollobás, C. Borgs, J. Chayes, and O. Riordan. Directed scale-free graphs. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’03, pages 132–139, 2003.
[3] C. Schweimer, C. Gfrerer, F. Lugstein, D. Pape, J. A. Velimsky, R. Elsässer, and B. C. Geiger. Generating simple directed social network graphs for information spreading. In Proceedings of the ACM Web Conference 2022, WWW ’22, pages 1475–1485, 2022.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets of synthetic workflows (task graphs) were generated to evaluate the performance and scalability of a multi-objective and multi-constrained scheduling approach for workflow applications of various structures, sizes, and sensing/actuating requirements in a cyber-physical system (CPS) based on the edge-hub-cloud paradigm. The examined CPS comprised four edge devices (i.e., single-board computers, each attached to an unmanned aerial vehicle (UAV) equipped with sensors/actuators) interacting with a hub device (e.g., a laptop), which in turn communicated with a more computationally capable cloud server. All system devices featured heterogeneous multicore processors with different processing core failure rates and varied sensing/actuating or other specialized capabilities. Our objectives were the minimization of the overall latency, the minimization of the overall energy consumption, and the maximization of the overall reliability of the workflow application in the specific CPS, under deadline, reliability, memory, storage, energy, capability, and task precedence constraints. We generated 25 random task graphs with 10, 20, 30, 40, and 50 nodes (5 task graphs for each size), utilizing the Task Graphs For Free (TGFF) random task graph generator [1],[2]. Additional task parameters (e.g., execution time, power consumption, memory, storage, output data size, capability, reliability threshold) were included post-generation, using appropriate values. More details are provided in README.txt.References:[1] R. P. Dick, D. L. Rhodes, and W. Wolf, "TGFF: Task graphs for free," Proceedings of the Sixth International Workshop on Hardware/Software Codesign (CODES/CASHE), 1998, pp. 97-101, doi: 10.1109/HSC.1998.666245.[2] R. P. Dick, D. L. Rhodes, and K. Vallerio, "TGFF," https://robertdick.org/projects/tgff/.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data we used to evaluate Louvain Method in the study Benchmarking Graph Databases on the Problem of Community Detection. These data werw synthetically generated using the LFR-Benchmark (3rd link). There are two type of files, networkX.dat and communityX.dat. The networkX.dat file contains the list of edges (nodes are labelled from 1 to the number of nodes; the edges are ordered and repeated twice, i.e. source-target and target-source). The first four lines of the networkX.dat file list the parameters we used to generate the data. The communityX.dat file contains a list of the nodes and their membership (memberships are labelled by integer numbers >=1). Note X correspond to the number of nodes each dataset contains.