Use the Chart Viewer template to display bar charts, line charts, pie charts, histograms, and scatterplots to complement a map. Include multiple charts to view with a map or side by side with other charts for comparison. Up to three charts can be viewed side by side or stacked, but you can access and view all the charts that are authored in the map. Examples: Present a bar chart representing average property value by county for a given area. Compare charts based on multiple population statistics in your dataset. Display an interactive scatterplot based on two values in your dataset along with an essential set of map exploration tools. Data requirements The Chart Viewer template requires a map with at least one chart configured. Key app capabilities Multiple layout options - Choose Stack to display charts stacked with the map, or choose Side by side to display charts side by side with the map. Manage chart - Reorder, rename, or turn charts on and off in the app. Multiselect chart - Compare two charts in the panel at the same time. Bookmarks - Allow users to zoom and pan to a collection of preset extents that are saved in the map. Home, Zoom controls, Legend, Layer List, Search Supportability This web app is designed responsively to be used in browsers on desktops, mobile phones, and tablets. We are committed to ongoing efforts towards making our apps as accessible as possible. Please feel free to leave a comment on how we can improve the accessibility of our apps for those who use assistive technologies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description. The NetVote dataset contains the outputs of the NetVote program when applied to voting data coming from VoteWatch (http://www.votewatch.eu/).
These results were used in the following conference papers:
Source code. The NetVote source code is available on GitHub: https://github.com/CompNet/NetVotes.
Citation. If you use our dataset or tool, please cite article [1] above.
@InProceedings{Mendonca2015,
author = {Mendonça, Israel and Figueiredo, Rosa and Labatut, Vincent and Michelon, Philippe},
title = {Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the {E}uropean {P}arliament},
booktitle = {2\textsuperscript{nd} European Network Intelligence Conference ({ENIC})},
year = {2015},
pages = {122-129},
address = {Karlskrona, SE},
publisher = {IEEE Publishing},
doi = {10.1109/ENIC.2015.25},
}
-------------------------
Details. This archive contains the following folders:
-------------------------
License. These data are shared under a Creative Commons 0 license.
Contact. Vincent Labatut <vincent.labatut@univ-avignon.fr> & Rosa Figueiredo <rosa.figueiredo@univ-avignon.fr>
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The role of knowledge graph encompasses the representation, organization, retrieval, reasoning, and application of knowledge, providing a rich and robust cognitive foundation for artificial intelligence systems and applications. When we learn new things, find out that some old information was wrong, see changes and progress happening, and adopt new technology standards, we need to update knowledge graphs. However, in some environments, the initial knowledge cannot be known. For example, we cannot have access to the full code of a software, even if we purchased it. In such circumstances, is there a way to update a knowledge graph without prior knowledge? In this paper, We are investigating whether there is a method for this situation within the framework of Dalal revision operators. We first proved that finding the optimal solution in this environment is a strongly NP-complete problem. For this purpose, we proposed two algorithms: Flaccid_search and Tight_search, which have different conditions, and we have proved that both algorithms can find the desired results.
Bipartite networks, also known as two-mode networks or affiliation networks, are a class of networks in which actors or objects are partitioned into two sets, with interactions taking place across but not within sets. These networks are omnipresent in society, encompassing phenomena such as student-teacher interactions, coalition structures, and international treaty participation. With growing data availability and proliferation in statistical estimators and software, scholars have increasingly sought to understand the methods available to model the data generating processes in these networks. This article compares three methods for doing so: (1) Logit; (2) the bipartite Exponential Random Graph Model (ERGM); and (3) the Relational Event Model (REM). This comparison demonstrates the relevance of choices with respect to dependence structures, temporality, parameter specification, and data structure. Considering the example of Ram Navami, a Hindu festival celebrating the birth of Lord Ram, the ego network of tweets using #RamNavami on April 21, 2021 is examined. The results of the analysis illustrate that critical modeling choices make a difference in the estimated parameters and the conclusions to be drawn from them.
Users can customize how data on a number of health indicators are presented, and the resulting tables, charts, and maps can be downloaded. Entire datasets are also available to download. Background Global Health Facts is a Kaiser Family Foundation website that provides global health data on the following topics: HIV/ AIDS; TB; Malaria; Other conditions, diseases and risk indicators; Programs, funding and financing; Health workforce and capacity; Demography and population; Income and the Economy. User Functionality Raw data (by topic) can be downloaded or users can create customized reports, charts, graphs or tables to compare 2 or more countries on different health indicators. Specific profiles for just one country or for one health topic can also be generated. Users can view data as a table, chart or map. Rankings of countries are also available. Data Notes Data sources include UNAIDS, WHO, and the CIA and links to the specific source is provided. Annual data is updated as it comes available. The most recent data is from 2009 (However this varies by exposure), and the site does not specify when new data becomes available.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Eggs US fell to 3.21 USD/Dozen on July 30, 2025, down 3.24% from the previous day. Over the past month, Eggs US's price has risen 25.14%, and is up 17.05% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks the benchmark market for this commodity. This dataset includes a chart with historical data for Eggs US.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Knowledge graph construction of heterogeneous data has seen a lot of uptake
in the last decade from compliance to performance optimizations with respect
to execution time. Besides execution time as a metric for comparing knowledge
graph construction, other metrics e.g. CPU or memory usage are not considered.
This challenge aims at benchmarking systems to find which RDF graph
construction system optimizes for metrics e.g. execution time, CPU,
memory usage, or a combination of these metrics.
Task description
The task is to reduce and report the execution time and computing resources
(CPU and memory usage) for the parameters listed in this challenge, compared
to the state-of-the-art of the existing tools and the baseline results provided
by this challenge. This challenge is not limited to execution times to create
the fastest pipeline, but also computing resources to achieve the most efficient
pipeline.
We provide a tool which can execute such pipelines end-to-end. This tool also
collects and aggregates the metrics such as execution time, CPU and memory
usage, necessary for this challenge as CSV files. Moreover, the information
about the hardware used during the execution of the pipeline is available as
well to allow fairly comparing different pipelines. Your pipeline should consist
of Docker images which can be executed on Linux to run the tool. The tool is
already tested with existing systems, relational databases e.g. MySQL and
PostgreSQL, and triplestores e.g. Apache Jena Fuseki and OpenLink Virtuoso
which can be combined in any configuration. It is strongly encouraged to use
this tool for participating in this challenge. If you prefer to use a different
tool or our tool imposes technical requirements you cannot solve, please contact
us directly.
Part 1: Knowledge Graph Construction Parameters
These parameters are evaluated using synthetic generated data to have more
insights of their influence on the pipeline.
Data
Mappings
Part 2: GTFS-Madrid-Bench
The GTFS-Madrid-Bench provides insights in the pipeline with real data from the
public transport domain in Madrid.
Scaling
Heterogeneity
Example pipeline
The ground truth dataset and baseline results are generated in different steps
for each parameter:
The pipeline is executed 5 times from which the median execution time of each
step is calculated and reported. Each step with the median execution time is
then reported in the baseline results with all its measured metrics.
Query timeout is set to 1 hour and knowledge graph construction timeout
to 24 hours. The execution is performed with the following tool: https://github.com/kg-construct/challenge-tool,
you can adapt the execution plans for this example pipeline to your own needs.
Each parameter has its own directory in the ground truth dataset with the
following files:
metadata.json
.Datasets
Knowledge Graph Construction Parameters
The dataset consists of:
Format
All input datasets are provided as CSV, depending on the parameter that is being
evaluated, the number of rows and columns may differ. The first row is always
the header of the CSV.
GTFS-Madrid-Bench
The dataset consists of:
Format
CSV datasets always have a header as their first row.
JSON and XML datasets have their own schema.
Evaluation criteria
Submissions must evaluate the following metrics:
Expected output
Duplicate values
Scale | Number of Triples |
---|---|
0 percent | 2000000 triples |
25 percent | 1500020 triples |
50 percent | 1000020 triples |
75 percent | 500020 triples |
100 percent | 20 triples |
Empty values
Scale | Number of Triples |
---|---|
0 percent | 2000000 triples |
25 percent | 1500000 triples |
50 percent | 1000000 triples |
75 percent | 500000 triples |
100 percent | 0 triples |
Mappings
Scale | Number of Triples |
---|---|
1TM + 15POM | 1500000 triples |
3TM + 5POM | 1500000 triples |
5TM + 3POM | 1500000 triples |
15TM + 1POM | 1500000 triples |
Properties
Scale | Number of Triples |
---|---|
1M rows 1 column | 1000000 triples |
1M rows 10 columns | 10000000 triples |
1M rows 20 columns | 20000000 triples |
1M rows 30 columns | 30000000 triples |
Records
Scale | Number of Triples |
---|---|
10K rows 20 columns | 200000 triples |
100K rows 20 columns | 2000000 triples |
1M rows 20 columns | 20000000 triples |
10M rows 20 columns | 200000000 triples |
Joins
1-1 joins
Scale | Number of Triples |
---|---|
0 percent | 0 triples |
25 percent | 125000 triples |
50 percent | 250000 triples |
75 percent | 375000 triples |
100 percent | 500000 triples |
1-N joins
Scale | Number of Triples |
---|---|
1-10 0 percent | 0 triples |
1-10 25 percent | 125000 triples |
1-10 50 percent | 250000 triples |
1-10 75 percent | 375000 |
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains information on application install interactions of users in the Myket android application market. The dataset was created for the purpose of evaluating interaction prediction models, requiring user and item identifiers along with timestamps of the interactions. Hence, the dataset can be used for interaction prediction and building a recommendation system. Furthermore, the data forms a dynamic network of interactions, and we can also perform network representation learning on the nodes in the network, which are users and applications.
Data Creation
The dataset was initially generated by the Myket data team, and later cleaned and subsampled by Erfan Loghmani a master student at Sharif University of Technology at the time. The data team focused on a two-week period and randomly sampled 1/3 of the users with interactions during that period. They then selected install and update interactions for three months before and after the two-week period, resulting in interactions spanning about 6 months and two weeks.
We further subsampled and cleaned the data to focus on application download interactions. We identified the top 8000 most installed applications and selected interactions related to them. We retained users with more than 32 interactions, resulting in 280,391 users. From this group, we randomly selected 10,000 users, and the data was filtered to include only interactions for these users. The detailed procedure can be found in here.
Data Structure
The dataset has two main files.
myket.csv
: This file contains the interaction information and follows the same format as the datasets used in the "JODIE: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks" (ACM SIGKDD 2019) project. However, this data does not contain state labels and interaction features, resulting in associated columns being all zero.app_info_sample.csv
: This file comprises features associated with applications present in the sample. For each individual application, information such as the approximate number of installs, average rating, count of ratings, and category are included. These features provide insights into the applications present in the dataset.Dataset Details
For a detailed summary of the data's statistics, including information on users, applications, and interactions, please refer to the Python notebook available at summary-stats.ipynb. The notebook provides an overview of the dataset's characteristics and can be helpful for understanding the data's structure before using it for research or analysis.
Top 20 Most Installed Applications
Package Name | Count of Interactions |
---|---|
com.instagram.android | 15292 |
ir.resaneh1.iptv | 12143 |
com.tencent.ig | 7919 |
com.ForgeGames.SpecialForcesGroup2 | 7797 |
ir.nomogame.ClutchGame | 6193 |
com.dts.freefireth | 6041 |
com.whatsapp | 5876 |
com.supercell.clashofclans | 5817 |
com.mojang.minecraftpe | 5649 |
com.lenovo.anyshare.gps | 5076 |
ir.medu.shad | 4673 |
com.firsttouchgames.dls3 | 4641 |
com.activision.callofduty.shooter | 4357 |
com.tencent.iglite | 4126 |
com.aparat | 3598 |
com.kiloo.subwaysurf | 3135 |
com.supercell.clashroyale | 2793 |
co.palang.QuizOfKings | 2589 |
com.nazdika.app | 2436 |
com.digikala | 2413 |
Comparison with SNAP Datasets
The Myket dataset introduced in this repository exhibits distinct characteristics compared to the real-world datasets used by the project. The table below provides a comparative overview of the key dataset characteristics:
Dataset | #Users | #Items | #Interactions | Average Interactions per User | Average Unique Items per User |
---|---|---|---|---|---|
Myket | 10,000 | 7,988 | 694,121 | 69.4 | 54.6 |
LastFM | 980 | 1,000 | 1,293,103 | 1,319.5 | 158.2 |
10,000 | 984 | 672,447 | 67.2 | 7.9 | |
Wikipedia | 8,227 | 1,000 | 157,474 | 19.1 | 2.2 |
MOOC | 7,047 | 97 | 411,749 | 58.4 | 25.3 |
The Myket dataset stands out by having an ample number of both users and items, highlighting its relevance for real-world, large-scale applications. Unlike LastFM, Reddit, and Wikipedia datasets, where users exhibit repetitive item interactions, the Myket dataset contains a comparatively lower amount of repetitive interactions. This unique characteristic reflects the diverse nature of user behaviors in the Android application market environment.
Citation
If you use this dataset in your research, please cite the following preprint:
@misc{loghmani2023effect,
title={Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks},
author={Erfan Loghmani and MohammadAmin Fazli},
year={2023},
eprint={2308.06862},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
How much time do people spend on social media?
As of 2024, the average daily social media usage of internet users worldwide amounted to 143 minutes per day, down from 151 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of three hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in
the U.S. was just two hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively.
People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general.
During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
10th DIMACS Implementation Challenge
Updated July 2012
http://www.cc.gatech.edu/dimacs10/index.shtml http://www.cise.ufl.edu/research/sparse/dimacs10
As stated on their main website ( http://dimacs.rutgers.edu/Challenges/ ), the "DIMACS Implementation Challenges address questions of determining realistic algorithm performance where worst case analysis is overly pessimistic and probabilistic models are too unrealistic: experimentation can provide guides to realistic algorithm performance where analysis fails."
For the 10th DIMACS Implementation Challenge, the two related problems of graph partitioning and graph clustering were chosen. Graph partitioning and graph clustering are among the aforementioned questions or problem areas where theoretical and practical results deviate significantly from each other, so that experimental outcomes are of particular interest.
Problem Motivation
Graph partitioning and graph clustering are ubiquitous subtasks in many application areas. Generally speaking, both techniques aim at the identification of vertex subsets with many internal and few external edges. To name only a few, problems addressed by graph partitioning and graph clustering algorithms are:
Challenge Goals
One goal of this Challenge is to create a reproducible picture of the state-of-the-art in the area of graph partitioning (GP) and graph clustering (GC) algorithms. To this end we are identifying a standard set of benchmark instances and generators.
Moreover, after initiating a discussion with the community, we would like to establish the most appropriate problem formulations and objective functions for a variety of applications.
Another goal is to enable current researchers to compare their codes with each other, in hopes of identifying the most effective algorithmic innovations that have been proposed.
The final goal is to publish proceedings containing results presented at the Challenge workshop, and a book containing the best of the proceedings papers.
Problems Addressed
The precise problem formulations need to be established in the course of the Challenge. The descriptions below serve as a starting point.
Graph partitioning:
The most common formulation of the graph partitioning problem for an undirected graph G = (V,E) asks for a division of V into k pairwise disjoint subsets (partitions) such that all partitions are of approximately equal size and the edge-cut, i.e., the total number of edges having their incident nodes in different subdomains, is minimized. The problem is known to be NP-hard.
Graph clustering:
Clustering is an important tool for investigating the structural properties of data. Generally speaking, clustering refers to the grouping of objects such that objects in the same cluster are more similar to each other than to objects of different clusters. The similarity measure depends on the underlying application. Clustering graphs usually refers to the identification of vertex subsets (clusters) that have significantly more internal edges (to vertices of the same cluster) than external ones (to vertices of another cluster).
There are 12 data sets in the DIMACS10 collection:
clustering: real-world graphs commonly used as benchmarks coauthor: citation and co-author networks Delaunay: Delaunay triangulations of random points in the plane dyn-frames: frames from a 2D dynamic simulation Kronecker: synthetic graphs from the Graph500 benchmark numerical: graphs from numerical simulation random: random geometric graphs (random points in the unit square) streets: real-world street networks Walshaw: Chris Walshaw's graph partitioning archive matrix: graphs from the UF collection (not added here) redistrict: census networks star-mixtures : artificially generated from sets of real graphs
Some of the graphs already exist in the UF Collection. In some cases, the original graph is unsymmetric, with values, whereas the DIMACS graph is the symmetrized pattern of A+A'. Rather than add duplicate patterns to the UF Collection, a MATLAB script is provided at http://www.cise.ufl.edu/research/sparse/dimacs10 which downloads each matrix from the UF Collection via UFget, and then performs whatever operation is required to convert the matrix to the DIMACS graph problem. Also posted at that page is a MATLAB code (metis_graph) for reading the DIMACS *.graph files into MATLAB.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead of
urban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
Data exchange with RO-Crates and Knowledge Graphs Workshop at Open Science Festival 2023, Köln, Germany Citation: Soiland-Reyes, S., Castro, L. J., & Rebholz-Schuhmann, D. (2023, July 5). Data exchange with RO-Crates and Knowledge Graphs. Zenodo. https://doi.org/10.5281/zenodo.10552449 Date: Wednesday 5th July 2023 09:30-12:30 CEST Room: Room John Nash (CECAD Lecture Hall) Agenda (below) Chairs: Stian Soiland-Reyes (RO-Crate, ELIXIR-UK, BY-COVID, FAIR-IMPACT, EOSC-Life, EuroScienceGateway) Leyla Jael Castro (Bioschemas, NFDI4DataScience, ZB MED Information Centre for Life Sciences) Dietrich Rebholz-Schuhmann (Scientific Director – ZB MED) Abstract: Digital Objects (e.g., data, software) and formal knowledge representations (e.g., structured metadata) form the two sides of the Linked Open Science coin: scientists want to exchange their data as Open Science material, possibly embedding it into RO-Crates, while they also want to share their findings and knowledge in a formalised representation, possibly deploying Knowledge Graph representations. What is the meeting point between data distributed via RO-Crates and that one part of a Knowledge Graph? Do RO-Crates and Knowledge Graphs serve different types of data or do they complement each other? What would be the dis/advantages of having Knowledge Graphs in RO-Crates or RO-Crates as nodes in Knowledge Graphs? In this workshop, we will first introduce RO-Crates and then have a round table open discussion to compare the potentials of the different approaches for capturing the data and the knowledge of the scientific world. RO-Crate is a community effort to practically achieve FAIR packaging of research objects (digital objects like data, methods, software) with structured metadata. RO-Crate uses well-established Web standards and FAIR principles. For common metadata representations, RO-Crate builds on schema.org, a mature and general mark-up vocabulary used by search engines including Google Dataset Search. RO-Crate is adapted by many EU/EOSC projects as a pragmatic implementation of the FAIR Digital Objects vision. Agenda Time (CEST) Wed 2023-07-06 09:30 Overview of FAIR data publishing and RO-Crate Speaker: Leyla Jael Castro, Stian Soiland-Reyes Overview of FAIR data publishing and RO-Crate (files included in this record) 09:50 A very brief introduction to making metadata with JSON-LD Speaker: Stian Soiland-Reyes JSON-LD intro (files included in this record) 10:00 Tutorial: FAIRify a dataset using just enough metadata Speaker: Leyla Jael Castro FAIRify datasets with Bioschemas tutorial (files included in this record) 10:30 Tutorial: Packaging a dataset with its metadata as a RO-Crate Speaker: Stian Soiland-Reyes See training material 10:50 Making your own metadata profile Speaker: Stian Soiland-Reyes 11:00 Coffee Break 11:30 Demo: Using Linked Data tooling to query knowledge graphs and challenges Speaker: Stian Soiland-Reyes https://github.com/stain/ro-crate-sparql/blob/main/ro-crate-sparql.ipynb 11:50 Open discussion, feedback and requirements from early adopters Moderator: Dietrich Rebholz-Schuhmann 12:20 Wrap-up and next steps Lead: Stian Soiland-Reyes
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Knowledge graph construction of heterogeneous data has seen a lot of uptake
in the last decade from compliance to performance optimizations with respect
to execution time. Besides execution time as a metric for comparing knowledge
graph construction, other metrics e.g. CPU or memory usage are not considered.
This challenge aims at benchmarking systems to find which RDF graph
construction system optimizes for metrics e.g. execution time, CPU,
memory usage, or a combination of these metrics.
Task description
The task is to reduce and report the execution time and computing resources
(CPU and memory usage) for the parameters listed in this challenge, compared
to the state-of-the-art of the existing tools and the baseline results provided
by this challenge. This challenge is not limited to execution times to create
the fastest pipeline, but also computing resources to achieve the most efficient
pipeline.
We provide a tool which can execute such pipelines end-to-end. This tool also
collects and aggregates the metrics such as execution time, CPU and memory
usage, necessary for this challenge as CSV files. Moreover, the information
about the hardware used during the execution of the pipeline is available as
well to allow fairly comparing different pipelines. Your pipeline should consist
of Docker images which can be executed on Linux to run the tool. The tool is
already tested with existing systems, relational databases e.g. MySQL and
PostgreSQL, and triplestores e.g. Apache Jena Fuseki and OpenLink Virtuoso
which can be combined in any configuration. It is strongly encouraged to use
this tool for participating in this challenge. If you prefer to use a different
tool or our tool imposes technical requirements you cannot solve, please contact
us directly.
The set of new specification for the RDF Mapping Language (RML) established by the W3C Community Group on Knowledge Graph Construction provide a set of test-cases for each module:
These test-cases are evaluated in this Track of the Challenge to determine their feasibility, correctness, etc. by applying them in implementations. This Track is in Beta status because these new specifications have not seen any implementation yet, thus it may contain bugs and issues. If you find problems with the mappings, output, etc. please report them to the corresponding repository of each module.
Through this Track we aim to spark development of implementations for the new specifications and improve the test-cases. Let us know your problems with the test-cases and we will try to find a solution.
Part 1: Knowledge Graph Construction Parameters
These parameters are evaluated using synthetic generated data to have more
insights of their influence on the pipeline.
Data
Mappings
Part 2: GTFS-Madrid-Bench
The GTFS-Madrid-Bench provides insights in the pipeline with real data from the
public transport domain in Madrid.
Scaling
Heterogeneity
Example pipeline
The ground truth dataset and baseline results are generated in different steps
for each parameter:
The pipeline is executed 5 times from which the median execution time of each
step is calculated and reported. Each step with the median execution time is
then reported in the baseline results with all its measured metrics.
Knowledge graph construction timeout is set to 24 hours.
The execution is performed with the following tool: https://github.com/kg-construct/challenge-tool,
you can adapt the execution plans for this example pipeline to your own needs.
Each parameter has its own directory in the ground truth dataset with the
following files:
metadata.json
.Datasets
Knowledge Graph Construction Parameters
The dataset consists of:
Format
All input datasets are provided as CSV, depending on the parameter that is being
evaluated, the number of rows and columns may differ. The first row is always
the header of the CSV.
GTFS-Madrid-Bench
The dataset consists of:
Format
CSV datasets always have a header as their first row.
JSON and XML datasets have their own schema.
Evaluation criteria
Submissions must evaluate the following metrics:
Expected output
Duplicate values
Scale | Number of Triples |
---|---|
0 percent | 2000000 triples |
25 percent | 1500020 triples |
50 percent | 1000020 triples |
75 percent | 500020 triples |
100 percent | 20 triples |
Empty values
Scale | Number of Triples |
---|---|
0 percent | 2000000 triples |
25 percent | 1500000 triples |
50 percent | 1000000 triples |
75 percent | 500000 triples |
100 percent | 0 triples |
Mappings
Scale | Number of Triples |
---|---|
1TM + 15POM | 1500000 triples |
3TM + 5POM | 1500000 triples |
5TM + 3POM | 1500000 triples |
15TM + 1POM | 1500000 triples |
Properties
Scale | Number of Triples |
---|---|
1M rows 1 column | 1000000 triples |
1M rows 10 columns | 10000000 triples |
1M rows 20 columns | 20000000 triples |
1M rows 30 columns | 30000000 |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cost of food in India decreased 1.06 percent in June of 2025 over the same month in the previous year. This dataset provides - India Food Inflation - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Introduction This dataset contains the data described in the paper titled "A deep neural network approach to predicting clinical outcomes of neuroblastoma patients." by Tranchevent, Azuaje and Rajapakse. More precisely, this dataset contains the topological features extracted from graphs built from publicly available expression data (see details below). This dataset does not contain the original expression data, which are available elsewhere. We thank the scientists who did generate and share these data (please see below the relevant links and publications). Content File names start with the name of the publicly available dataset they are built on (among "Fischer", "Maris" and "Versteeg"). This name is followed by a tag representing whether they contain raw data ("raw", which means, in this case, the raw topological features) or TF formatted data ("TF", which stands for TensorFlow). This tag is then followed by a unique identifier representing a unique configuration. The configuration file "Global_configuration.tsv" contains details about these configurations such as which topological features are present and which clinical outcome is considered. The code associated to the same manuscript that uses these data is at https://gitlab.com/biomodlih/SingalunDeep. The procedure by which the raw data are transformed into the TensorFlow ready data is described in the paper. File format All files are TSV files that correspond to matrices with samples as rows and features as columns (or clinical data as columns for clinical data files). The data files contain various sets of topological features that were extracted from the sample graphs (or Patient Similarity Networks - PSN). The clinical files contain relevant clinical outcomes. The raw data files only contain the topological data. For instance, the file "Fischer_raw_2d0000_data_tsv" contains 24 values for each sample corresponding to the 12 centralities computed for both the microarray (Fischer-M) and RNA-seq (Fischer-R) datasets. The TensorFlow ready files do not contain the sample identifiers in the first column. However, they contain two extra columns at the end. The first extra column is the sample weights (for the classifiers and because we very often have a dominant class). The second extra column is the class labels (binary), based on the clinical outcome of interest. Dataset details The Fischer dataset is used to train, evaluate and validate the models, so the dataset is split into train / eval / valid files, which contains respectively 249, 125 and 124 rows (samples) of the original 498 samples. In contrast, the other two datasets (Maris and Versteeg) are smaller and are only used for validation (and therefore have no training or evaluation file). The Fischer dataset also has more data files because various configurations were tested (see manuscript). In contrast, the validation, using the Maris and Versteeg datasets is only done for a single configuration and there are therefore less files. For Fischer, a few configurations are listed in the global configuration file but there is no corresponding raw data. This is because these items are derived from concatenations of the original raw data (see global configuration file and manuscript for details). References This dataset is associated with Tranchevent L., Azuaje F.. Rajapakse J.C., A deep neural network approach to predicting clinical outcomes of neuroblastoma patients. If you use these data in your research, please do not forget to also cite the researchers who have generated the original expression datasets. Fischer dataset: Zhang W. et al., Comparison of RNA-seq and microarray-based models for clinical endpoint prediction. Genome Biology 16(1) (2015). doi:10.1186/s13059-015-0694-1 Wang C. et al., The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat. Biotechnol. 32(9), 926–932. doi:10.1038/nbt.3001 Versteeg dataset: Molenaar J.J. et al., Sequencing of neuroblastoma identifies chromothripsis and defects in neuritogenesis genes. Nature 483(7391), 589–593. doi:10.1038/nature10910 Maris dataset: Wang Q. et al., Integrative genomics identifies distinct molecular classes of neuroblastoma and shows that multiple genes are targeted by regional alterations in DNA copy number. Cancer Res. 66(12), 6050–6062. doi:10.1158/0008-5472.CAN-05-4618 Project supported by the Fonds National de la Recherche (FNR), Luxembourg (SINGALUN project). This research was also partially supported by Tier-2 grant MOE2016-T2-1-029 by the Ministry of Education, Singapore.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Inflation Rate in India decreased to 2.10 percent in June from 2.82 percent in May of 2025. This dataset provides - India Inflation Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
As of January 2024, #love was the most used hashtag on Instagram, being included in over two billion posts on the social media platform. #Instagood and #instagram were used over one billion times as of early 2024.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Inflation Rate in Turkey decreased to 35.05 percent in June from 35.41 percent in May of 2025. This dataset provides the latest reported value for - Turkey Inflation Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The USD/VND exchange rate fell to 26,199.0000 on July 31, 2025, down 0.01% from the previous session. Over the past month, the Vietnamese Dong has weakened 0.26%, and is down by 3.90% over the last 12 months. Vietnamese Dong - values, historical data, forecasts and news - updated on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This README documents the datasets and RDF graph used in the research article "AI-Powered Food Sustainability: Exploiting Knowledge Graphs for Reducing Carbon Footprints and Land Use" by Anand K. Gavai, Suniti Vadalkar, and Mahak Sharma. The study employs AI-driven knowledge graphs to analyze the environmental impacts of food items, focusing on protein sources, and to propose sustainability interventions aligned with the United Nations Sustainable Development Goals (SDGs). The datasets and RDF graph provided here support the construction and querying of the knowledge graph for sustainability analysis.
All data and associated source code are publicly available in a Zenodo repository:
DOI: 10.5281/zenodo.10143973
The datasets consist of structured environmental data for various food items, integrated into a knowledge graph to assess sustainability metrics such as carbon footprint, land use, water use, scarcity-weighted water use, and eutrophication. The data primarily focuses on global averages from 2010, sourced from Poore & Nemecek (2018), with an emphasis on protein-rich foods (e.g., beef, cheese, legumes) and other dietary staples.
The data were sourced from:
The repository includes CSV files and an RDF graph in Turtle format:
Five CSV files provide environmental metrics for 38 food items, focusing on 2010 global averages:
:Beef rdf:type :FoodItem ;
:hasCarbonFootprint 33.30 ;
:hasLandUse 43.24 ;
:hasWaterUse 2714 ;
:hasScarcityWeightedWaterUse 119805 ;
:hasEutrophication 365.29 .
The datasets cover the following environmental impact metrics:
These datasets and the RDF graph were used to:
Example applications:
All datasets and the RDF graph are available in the Zenodo repository:
The repository includes:
This work was supported by the ‘High Tech for a Sustainable Future’ capacity building programme of the 4TU Federation in the Netherlands.
For questions or further information, please contact the corresponding author:
If you use this dataset or RDF graph, please cite the original manuscript:
Gavai, A.K., Vadalkar, S., Sharma, M. (2025). AI-Powered Food Sustainability: Exploiting Knowledge Graphs for Reducing Carbon Footprints and Land Use.
Use the Chart Viewer template to display bar charts, line charts, pie charts, histograms, and scatterplots to complement a map. Include multiple charts to view with a map or side by side with other charts for comparison. Up to three charts can be viewed side by side or stacked, but you can access and view all the charts that are authored in the map. Examples: Present a bar chart representing average property value by county for a given area. Compare charts based on multiple population statistics in your dataset. Display an interactive scatterplot based on two values in your dataset along with an essential set of map exploration tools. Data requirements The Chart Viewer template requires a map with at least one chart configured. Key app capabilities Multiple layout options - Choose Stack to display charts stacked with the map, or choose Side by side to display charts side by side with the map. Manage chart - Reorder, rename, or turn charts on and off in the app. Multiselect chart - Compare two charts in the panel at the same time. Bookmarks - Allow users to zoom and pan to a collection of preset extents that are saved in the map. Home, Zoom controls, Legend, Layer List, Search Supportability This web app is designed responsively to be used in browsers on desktops, mobile phones, and tablets. We are committed to ongoing efforts towards making our apps as accessible as possible. Please feel free to leave a comment on how we can improve the accessibility of our apps for those who use assistive technologies.