49 datasets found

a
Chart Viewer
city-of-lawrenceville-arcgis-hub-lville.hub.arcgis.com
Updated Sep 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
esri_en (2021). Chart Viewer [Dataset]. https://city-of-lawrenceville-arcgis-hub-lville.hub.arcgis.com/items/be4582b38d764de0a970b986c824acde
Explore at:
Dataset updated
Sep 22, 2021
Dataset authored and provided by
esri_en
Description
Use the Chart Viewer template to display bar charts, line charts, pie charts, histograms, and scatterplots to complement a map. Include multiple charts to view with a map or side by side with other charts for comparison. Up to three charts can be viewed side by side or stacked, but you can access and view all the charts that are authored in the map. Examples: Present a bar chart representing average property value by county for a given area. Compare charts based on multiple population statistics in your dataset. Display an interactive scatterplot based on two values in your dataset along with an essential set of map exploration tools. Data requirements The Chart Viewer template requires a map with at least one chart configured. Key app capabilities Multiple layout options - Choose Stack to display charts stacked with the map, or choose Side by side to display charts side by side with the map. Manage chart - Reorder, rename, or turn charts on and off in the app. Multiselect chart - Compare two charts in the panel at the same time. Bookmarks - Allow users to zoom and pan to a collection of preset extents that are saved in the map. Home, Zoom controls, Legend, Layer List, Search Supportability This web app is designed responsively to be used in browsers on desktops, mobile phones, and tablets. We are committed to ongoing efforts towards making our apps as accessible as possible. Please feel free to leave a comment on how we can improve the accessibility of our apps for those who use assistive technologies.
NetVotes ENIC Dataset
zenodo.org
explore.openaire.eu
txt, zip
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça (2024). NetVotes ENIC Dataset [Dataset]. http://doi.org/10.5281/zenodo.6815510
Explore at:
zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6815510
Dataset updated
Oct 1, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description. The NetVote dataset contains the outputs of the NetVote program when applied to voting data coming from VoteWatch (http://www.votewatch.eu/).

These results were used in the following conference papers:

I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the European Parliament,” in 2nd European Network Intelligence Conference, 2015, pp. 122–129. ⟨hal-01176090⟩ DOI: 10.1109/ENIC.2015.25

I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Informative Value of Negative Links for Graph Partitioning, with an application to European Parliament Votes,” in 6ème Conférence sur les modèles et lánalyse de réseaux : approches mathématiques et informatiques, 2015, p. 12p. ⟨hal-02055158⟩

Source code. The NetVote source code is available on GitHub: https://github.com/CompNet/NetVotes.

Citation. If you use our dataset or tool, please cite article [1] above.

@InProceedings{Mendonca2015,
author = {Mendonça, Israel and Figueiredo, Rosa and Labatut, Vincent and Michelon, Philippe},

title = {Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the {E}uropean {P}arliament},
booktitle = {2\textsuperscript{nd} European Network Intelligence Conference ({ENIC})},
year = {2015},
pages = {122-129},
address = {Karlskrona, SE},
publisher = {IEEE Publishing},
doi = {10.1109/ENIC.2015.25},
}

-------------------------

Details. This archive contains the following folders:

`votewatch_data`: the raw data extracted from the VoteWatch website.

`VoteWatch Europe European Parliament, Council of the EU.csv`: list of the documents voted during the considered term, with some details such as the date and topic.

`votes_by_document`: this folder contains a collection of CSV files, each one describing the outcome of the vote session relatively to one specific document.

`intermediate_files`: this folder contains several CSV files:

`allvotes.csv`: concatenation of all vote outcomes for all documents and all MEPS. Can be considered as a compact representation of the data contained in the folder `votes_by_document`.

`loyalty.csv`: same thing than allvotes.csv, but for the loyalty (i.e. whether or not the MEP voted like the majority of the MEPs in his political group).

`MPs.csv`: list of the MEPs having voted at least once in the considered term, with their details.

`policies.csv`: list of the topics considered during the term.

`qtd_docs.csv`: list of the topics with the corresponding number of documents.

`parallel_ils_results`: contains the raw results of the ILS tool. This is an external algorithm able to estimate the optimal partition of the network nodes in terms of structural balance. It was applied to all the networks extracted by our scripts (from the VoteWatch data), and the produced files were placed here for postprocessing. Each subfolder corresponds to one of the topic-year pair.

`output_files`: contains the file produced by our scripts.

`agreement`: histograms representing the distributions of agreement and rebellion indices. Each subfolder corresponds to a specific topic.

`community_algorithms_csv`: Performances obtained by the partitioning algorithms (for both community detection and correlation clustering). Each subfolder corresponds to a specific topic.

`xxxx_cluster_information.csv`: table containing several variants of the imbalance measure, for the considered algorithms.

`community_algorithms_results`: Comparison of the partitions detected by the various algorithms considered, and distribution of the cluster/community sizes. Each subfolder corresponds to a specific topic.

`xxxx_cluster_comparison.csv`: table comparing the partitions detected by the community detection algorithms, in terms of Rand index and other measures.

`xxxx_ils_cluster_comparison.csv`: like `xxxx_cluster_comparison.csv`, except we compare the partition of community detection algorithms with that of the ILS.

`xxxx_yyyy_distribution.pdf`: histogram of the community (or cluster) sizes detected by algorithm `yyyy`.

`graphs`: the networks extracted from the vote data. Each subfolder corresponds to a specific topic.

`xxxx_complete_graph.graphml`: network at the Graphml format, with all the information: nodes, edges, nodal attributes (including communities), weights, etc.

`xxxx_edges_Gephi.csv`: only the links, with their weights (i.e. vote similarity).

`xxxx_graph.g`: network at the g format (for ILS).

`xxxx_net_measures.csv`: table containing some stats on the network (number of links, etc.).

`xxxx_nodes_Gephi.csv`: list of nodes (i.e. MEPs), with details.

`plots`: synthesis plots from the paper.

-------------------------

License. These data are shared under a Creative Commons 0 license.

Contact. Vincent Labatut <vincent.labatut@univ-avignon.fr> & Rosa Figueiredo <rosa.figueiredo@univ-avignon.fr>
f
Comparison of AR.
plos.figshare.com
xls
Updated Jul 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuangmei Wang; Fengjie Sun (2024). Comparison of AR. [Dataset]. http://doi.org/10.1371/journal.pone.0302490.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302490.t002
Dataset updated
Jul 5, 2024
Dataset provided by
PLOS ONE
Authors
Shuangmei Wang; Fengjie Sun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The role of knowledge graph encompasses the representation, organization, retrieval, reasoning, and application of knowledge, providing a rich and robust cognitive foundation for artificial intelligence systems and applications. When we learn new things, find out that some old information was wrong, see changes and progress happening, and adopt new technology standards, we need to update knowledge graphs. However, in some environments, the initial knowledge cannot be known. For example, we cannot have access to the full code of a software, even if we purchased it. In such circumstances, is there a way to update a knowledge graph without prior knowledge? In this paper, We are investigating whether there is a method for this situation within the framework of Dalal revision operators. We first proved that finding the optimal solution in this environment is a strongly NP-complete problem. For this purpose, we proposed two algorithms: Flaccid_search and Tight_search, which have different conditions, and we have proved that both algorithms can find the desired results.
d
Data from: Tweeting #RamNavami: A Comparison of Approaches to Analyzing...
search.dataone.org
dataverse.harvard.edu
Updated Nov 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heaney, Michael (2023). Tweeting #RamNavami: A Comparison of Approaches to Analyzing Bipartite Networks [Dataset]. http://doi.org/10.7910/DVN/HD45EI
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/HD45EI
Dataset updated
Nov 14, 2023
Dataset provided by
Harvard Dataverse
Authors
Heaney, Michael
Description
Bipartite networks, also known as two-mode networks or affiliation networks, are a class of networks in which actors or objects are partitioned into two sets, with interactions taking place across but not within sets. These networks are omnipresent in society, encompassing phenomena such as student-teacher interactions, coalition structures, and international treaty participation. With growing data availability and proliferation in statistical estimators and software, scholars have increasingly sought to understand the methods available to model the data generating processes in these networks. This article compares three methods for doing so: (1) Logit; (2) the bipartite Exponential Random Graph Model (ERGM); and (3) the Relational Event Model (REM). This comparison demonstrates the relevance of choices with respect to dependence structures, temporality, parameter specification, and data structure. Considering the example of Ram Navami, a Hindu festival celebrating the birth of Lord Ram, the ego network of tweets using #RamNavami on April 21, 2021 is examined. The results of the analysis illustrate that critical modeling choices make a difference in the estimated parameters and the conclusions to be drawn from them.
d
Global Health Facts
datamed.org
Updated May 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Global Health Facts [Dataset]. https://datamed.org/display-item.php?repository=0012&idName=ID&id=56d4b851e4b0e644d31324fc
Explore at:
Dataset updated
May 19, 2016
Description
Users can customize how data on a number of health indicators are presented, and the resulting tables, charts, and maps can be downloaded. Entire datasets are also available to download. Background Global Health Facts is a Kaiser Family Foundation website that provides global health data on the following topics: HIV/ AIDS; TB; Malaria; Other conditions, diseases and risk indicators; Programs, funding and financing; Health workforce and capacity; Demography and population; Income and the Economy. User Functionality Raw data (by topic) can be downloaded or users can create customized reports, charts, graphs or tables to compare 2 or more countries on different health indicators. Specific profiles for just one country or for one health topic can also be generated. Users can view data as a table, chart or map. Rankings of countries are also available. Data Notes Data sources include UNAIDS, WHO, and the CIA and links to the specific source is provided. Annual data is updated as it comes available. The most recent data is from 2009 (However this varies by exposure), and the site does not specify when new data becomes available.
T
Eggs US - Price Data
tradingeconomics.com
de.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). Eggs US - Price Data [Dataset]. https://tradingeconomics.com/commodity/eggs-us
Explore at:
excel, csv, xml, jsonAvailable download formats
Dataset updated
Jul 30, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 25, 2012 - Jul 30, 2025
Area covered
World
Description
Eggs US fell to 3.21 USD/Dozen on July 30, 2025, down 3.24% from the previous day. Over the past month, Eggs US's price has risen 25.14%, and is up 17.05% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks the benchmark market for this commodity. This dataset includes a chart with historical data for Eggs US.
Data from: KGCW 2023 Challenge @ ESWC 2023
zenodo.org
investigacion.usc.gal
application/gzip
Updated Apr 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Şimşek; Umutcan Şimşek; Ana Iglesias; Ana Iglesias (2024). KGCW 2023 Challenge @ ESWC 2023 [Dataset]. http://doi.org/10.5281/zenodo.7837289
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7837289
Dataset updated
Apr 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Şimşek; Umutcan Şimşek; Ana Iglesias; Ana Iglesias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Knowledge Graph Construction Workshop 2023: challenge

Knowledge graph construction of heterogeneous data has seen a lot of uptake
in the last decade from compliance to performance optimizations with respect
to execution time. Besides execution time as a metric for comparing knowledge
graph construction, other metrics e.g. CPU or memory usage are not considered.
This challenge aims at benchmarking systems to find which RDF graph
construction system optimizes for metrics e.g. execution time, CPU,
memory usage, or a combination of these metrics.

Task description

The task is to reduce and report the execution time and computing resources
(CPU and memory usage) for the parameters listed in this challenge, compared
to the state-of-the-art of the existing tools and the baseline results provided
by this challenge. This challenge is not limited to execution times to create
the fastest pipeline, but also computing resources to achieve the most efficient
pipeline.

We provide a tool which can execute such pipelines end-to-end. This tool also
collects and aggregates the metrics such as execution time, CPU and memory
usage, necessary for this challenge as CSV files. Moreover, the information
about the hardware used during the execution of the pipeline is available as
well to allow fairly comparing different pipelines. Your pipeline should consist
of Docker images which can be executed on Linux to run the tool. The tool is
already tested with existing systems, relational databases e.g. MySQL and
PostgreSQL, and triplestores e.g. Apache Jena Fuseki and OpenLink Virtuoso
which can be combined in any configuration. It is strongly encouraged to use
this tool for participating in this challenge. If you prefer to use a different
tool or our tool imposes technical requirements you cannot solve, please contact
us directly.

Part 1: Knowledge Graph Construction Parameters

These parameters are evaluated using synthetic generated data to have more
insights of their influence on the pipeline.

Data

Number of data records: scaling the data size vertically by the number of records with a fixed number of data properties (10K, 100K, 1M, 10M records).

Number of data properties: scaling the data size horizontally by the number of data properties with a fixed number of data records (1, 10, 20, 30 columns).

Number of duplicate values: scaling the number of duplicate values in the dataset (0%, 25%, 50%, 75%, 100%).

Number of empty values: scaling the number of empty values in the dataset (0%, 25%, 50%, 75%, 100%).

Number of input files: scaling the number of datasets (1, 5, 10, 15).

Mappings

Number of subjects: scaling the number of subjects with a fixed number of predicates and objects (1, 10, 20, 30 TMs).

Number of predicates and objects: scaling the number of predicates and objects with a fixed number of subjects (1, 10, 20, 30 POMs).

Number of and type of joins: scaling the number of joins and type of joins (1-1, N-1, 1-N, N-M)

Part 2: GTFS-Madrid-Bench

The GTFS-Madrid-Bench provides insights in the pipeline with real data from the
public transport domain in Madrid.

Scaling

GTFS-1 SQL

GTFS-10 SQL

GTFS-100 SQL

GTFS-1000 SQL

Heterogeneity

GTFS-100 XML + JSON

GTFS-100 CSV + XML

GTFS-100 CSV + JSON

GTFS-100 SQL + XML + JSON + CSV

Example pipeline

The ground truth dataset and baseline results are generated in different steps
for each parameter:

The provided CSV files and SQL schema are loaded into a MySQL relational database.

Mappings are executed by accessing the MySQL relational database to construct a knowledge graph in N-Triples as RDF format.

The constructed knowledge graph is loaded into a Virtuoso triplestore, tuned according to the Virtuoso documentation.

The provided SPARQL queries are executed on the SPARQL endpoint exposed by Virtuoso.

The pipeline is executed 5 times from which the median execution time of each
step is calculated and reported. Each step with the median execution time is
then reported in the baseline results with all its measured metrics.
Query timeout is set to 1 hour and knowledge graph construction timeout
to 24 hours. The execution is performed with the following tool: https://github.com/kg-construct/challenge-tool,
you can adapt the execution plans for this example pipeline to your own needs.

Each parameter has its own directory in the ground truth dataset with the
following files:

Input dataset as CSV.

Mapping file as RML.

Queries as SPARQL.

Execution plan for the pipeline in metadata.json.

Datasets

Knowledge Graph Construction Parameters

The dataset consists of:

Input dataset as CSV for each parameter.

Mapping file as RML for each parameter.

SPARQL queries to retrieve the results for each parameter.

Baseline results for each parameter with the example pipeline.

Ground truth dataset for each parameter generated with the example pipeline.

Format

All input datasets are provided as CSV, depending on the parameter that is being
evaluated, the number of rows and columns may differ. The first row is always
the header of the CSV.

GTFS-Madrid-Bench

The dataset consists of:

Input dataset as CSV with SQL schema for the scaling and a combination of XML,

CSV, and JSON is provided for the heterogeneity.

Mapping file as RML for both scaling and heterogeneity.

SPARQL queries to retrieve the results.

Baseline results with the example pipeline.

Ground truth dataset generated with the example pipeline.

Format

CSV datasets always have a header as their first row.
JSON and XML datasets have their own schema.

Evaluation criteria

Submissions must evaluate the following metrics:

Execution time of all the steps in the pipeline. The execution time of a step is the difference between the begin and end time of a step.

CPU time as the time spent in the CPU for all steps of the pipeline. The CPU time of a step is the difference between the begin and end CPU time of a step.

Minimal and maximal memory consumption for each step of the pipeline. The minimal and maximal memory consumption of a step is the minimum and maximum calculated of the memory consumption during the execution of a step.

Expected output

Duplicate values

Scale Number of Triples
0 percent 2000000 triples
25 percent 1500020 triples
50 percent 1000020 triples
75 percent 500020 triples
100 percent 20 triples

Empty values

Scale Number of Triples
0 percent 2000000 triples
25 percent 1500000 triples
50 percent 1000000 triples
75 percent 500000 triples
100 percent 0 triples

Mappings

Scale Number of Triples
1TM + 15POM 1500000 triples
3TM + 5POM 1500000 triples
5TM + 3POM 1500000 triples
15TM + 1POM 1500000 triples

Properties

Scale Number of Triples
1M rows 1 column 1000000 triples
1M rows 10 columns 10000000 triples
1M rows 20 columns 20000000 triples
1M rows 30 columns 30000000 triples

Records

Scale Number of Triples
10K rows 20 columns 200000 triples
100K rows 20 columns 2000000 triples
1M rows 20 columns 20000000 triples
10M rows 20 columns 200000000 triples

Joins

1-1 joins

Scale Number of Triples
0 percent 0 triples
25 percent 125000 triples
50 percent 250000 triples
75 percent 375000 triples
100 percent 500000 triples

1-N joins

Scale Number of Triples
1-10 0 percent 0 triples
1-10 25 percent 125000 triples
1-10 50 percent 250000 triples
1-10 75 percent 375000

Scale	Number of Triples
0 percent	2000000 triples
25 percent	1500020 triples
50 percent	1000020 triples
75 percent	500020 triples
100 percent	20 triples

Scale	Number of Triples
0 percent	2000000 triples
25 percent	1500000 triples
50 percent	1000000 triples
75 percent	500000 triples
100 percent	0 triples

Scale	Number of Triples
1TM + 15POM	1500000 triples
3TM + 5POM	1500000 triples
5TM + 3POM	1500000 triples
15TM + 1POM	1500000 triples

Scale	Number of Triples
1M rows 1 column	1000000 triples
1M rows 10 columns	10000000 triples
1M rows 20 columns	20000000 triples
1M rows 30 columns	30000000 triples

Scale	Number of Triples
10K rows 20 columns	200000 triples
100K rows 20 columns	2000000 triples
1M rows 20 columns	20000000 triples
10M rows 20 columns	200000000 triples

Scale	Number of Triples
0 percent	0 triples
25 percent	125000 triples
50 percent	250000 triples
75 percent	375000 triples
100 percent	500000 triples

Scale	Number of Triples
1-10 0 percent	0 triples
1-10 25 percent	125000 triples
1-10 50 percent	250000 triples
1-10 75 percent	375000

Myket Android Application Install Dataset

zenodo.org

bin, csv

Updated Aug 23, 2023

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Erfan Loghmani; MohammadAmin Fazli; Erfan Loghmani; MohammadAmin Fazli (2023). Myket Android Application Install Dataset [Dataset]. http://doi.org/10.48550/arxiv.2308.06862

Explore at:

bin, csvAvailable download formats

Unique identifier

https://doi.org/10.48550/arxiv.2308.06862

Dataset updated

Aug 23, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Erfan Loghmani; MohammadAmin Fazli; Erfan Loghmani; MohammadAmin Fazli

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This dataset contains information on application install interactions of users in the Myket android application market. The dataset was created for the purpose of evaluating interaction prediction models, requiring user and item identifiers along with timestamps of the interactions. Hence, the dataset can be used for interaction prediction and building a recommendation system. Furthermore, the data forms a dynamic network of interactions, and we can also perform network representation learning on the nodes in the network, which are users and applications.

Data Creation

The dataset was initially generated by the Myket data team, and later cleaned and subsampled by Erfan Loghmani a master student at Sharif University of Technology at the time. The data team focused on a two-week period and randomly sampled 1/3 of the users with interactions during that period. They then selected install and update interactions for three months before and after the two-week period, resulting in interactions spanning about 6 months and two weeks.

We further subsampled and cleaned the data to focus on application download interactions. We identified the top 8000 most installed applications and selected interactions related to them. We retained users with more than 32 interactions, resulting in 280,391 users. From this group, we randomly selected 10,000 users, and the data was filtered to include only interactions for these users. The detailed procedure can be found in here.

Data Structure

The dataset has two main files.

myket.csv: This file contains the interaction information and follows the same format as the datasets used in the "JODIE: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks" (ACM SIGKDD 2019) project. However, this data does not contain state labels and interaction features, resulting in associated columns being all zero.
app_info_sample.csv: This file comprises features associated with applications present in the sample. For each individual application, information such as the approximate number of installs, average rating, count of ratings, and category are included. These features provide insights into the applications present in the dataset.

Dataset Details

Total Instances: 694,121 install interaction instances
Instances Format: Triplets of user_id, app_name, timestamp
10,000 users and 7,988 android applications
Item features for 7,606 applications

For a detailed summary of the data's statistics, including information on users, applications, and interactions, please refer to the Python notebook available at summary-stats.ipynb. The notebook provides an overview of the dataset's characteristics and can be helpful for understanding the data's structure before using it for research or analysis.

Top 20 Most Installed Applications

Package Name	Count of Interactions
com.instagram.android	15292
ir.resaneh1.iptv	12143
com.tencent.ig	7919
com.ForgeGames.SpecialForcesGroup2	7797
ir.nomogame.ClutchGame	6193
com.dts.freefireth	6041
com.whatsapp	5876
com.supercell.clashofclans	5817
com.mojang.minecraftpe	5649
com.lenovo.anyshare.gps	5076
ir.medu.shad	4673
com.firsttouchgames.dls3	4641
com.activision.callofduty.shooter	4357
com.tencent.iglite	4126
com.aparat	3598
com.kiloo.subwaysurf	3135
com.supercell.clashroyale	2793
co.palang.QuizOfKings	2589
com.nazdika.app	2436
com.digikala	2413

Comparison with SNAP Datasets

The Myket dataset introduced in this repository exhibits distinct characteristics compared to the real-world datasets used by the project. The table below provides a comparative overview of the key dataset characteristics:

Dataset	#Users	#Items	#Interactions	Average Interactions per User	Average Unique Items per User
Myket	10,000	7,988	694,121	69.4	54.6
LastFM	980	1,000	1,293,103	1,319.5	158.2
Reddit	10,000	984	672,447	67.2	7.9
Wikipedia	8,227	1,000	157,474	19.1	2.2
MOOC	7,047	97	411,749	58.4	25.3

The Myket dataset stands out by having an ample number of both users and items, highlighting its relevance for real-world, large-scale applications. Unlike LastFM, Reddit, and Wikipedia datasets, where users exhibit repetitive item interactions, the Myket dataset contains a comparatively lower amount of repetitive interactions. This unique characteristic reflects the diverse nature of user behaviors in the Android application market environment.

Citation

If you use this dataset in your research, please cite the following preprint:

@misc{loghmani2023effect,
   title={Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks}, 
   author={Erfan Loghmani and MohammadAmin Fazli},
   year={2023},
   eprint={2308.06862},
   archivePrefix={arXiv},
   primaryClass={cs.LG}
}

Average daily time spent on social media worldwide 2012-2024

statista.com
es.statista.com

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Average daily time spent on social media worldwide 2012-2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

How much time do people spend on social media?

              As of 2024, the average daily social media usage of internet users worldwide amounted to 143 minutes per day, down from 151 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of three hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in
              the U.S. was just two hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively.
              People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general.
              During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.

P
Group DIMACS10 Dataset
paperswithcode.com
Updated Jul 15, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2012). Group DIMACS10 Dataset [Dataset]. https://paperswithcode.com/dataset/group-dimacs10-law-suitesparse-matrix
Explore at:
Dataset updated
Jul 15, 2012
Description
10th DIMACS Implementation Challenge

Updated July 2012

http://www.cc.gatech.edu/dimacs10/index.shtml http://www.cise.ufl.edu/research/sparse/dimacs10

As stated on their main website ( http://dimacs.rutgers.edu/Challenges/ ), the "DIMACS Implementation Challenges address questions of determining realistic algorithm performance where worst case analysis is overly pessimistic and probabilistic models are too unrealistic: experimentation can provide guides to realistic algorithm performance where analysis fails."

For the 10th DIMACS Implementation Challenge, the two related problems of graph partitioning and graph clustering were chosen. Graph partitioning and graph clustering are among the aforementioned questions or problem areas where theoretical and practical results deviate significantly from each other, so that experimental outcomes are of particular interest.

Problem Motivation

Graph partitioning and graph clustering are ubiquitous subtasks in many application areas. Generally speaking, both techniques aim at the identification of vertex subsets with many internal and few external edges. To name only a few, problems addressed by graph partitioning and graph clustering algorithms are:

What are the communities within an (online) social network?

How do I speed up a numerical simulation by mapping it efficiently onto a parallel computer?

How must components be organized on a computer chip such that they can communicate efficiently with each other?

What are the segments of a digital image?

Which functions are certain genes (most likely) responsible for?

Challenge Goals

One goal of this Challenge is to create a reproducible picture of the state-of-the-art in the area of graph partitioning (GP) and graph clustering (GC) algorithms. To this end we are identifying a standard set of benchmark instances and generators.

Moreover, after initiating a discussion with the community, we would like to establish the most appropriate problem formulations and objective functions for a variety of applications.

Another goal is to enable current researchers to compare their codes with each other, in hopes of identifying the most effective algorithmic innovations that have been proposed.

The final goal is to publish proceedings containing results presented at the Challenge workshop, and a book containing the best of the proceedings papers.

Problems Addressed

The precise problem formulations need to be established in the course of the Challenge. The descriptions below serve as a starting point.

Graph partitioning:

The most common formulation of the graph partitioning problem for an undirected graph G = (V,E) asks for a division of V into k pairwise disjoint subsets (partitions) such that all partitions are of approximately equal size and the edge-cut, i.e., the total number of edges having their incident nodes in different subdomains, is minimized. The problem is known to be NP-hard.

Graph clustering:

Clustering is an important tool for investigating the structural properties of data. Generally speaking, clustering refers to the grouping of objects such that objects in the same cluster are more similar to each other than to objects of different clusters. The similarity measure depends on the underlying application. Clustering graphs usually refers to the identification of vertex subsets (clusters) that have significantly more internal edges (to vertices of the same cluster) than external ones (to vertices of another cluster).

There are 12 data sets in the DIMACS10 collection:

clustering: real-world graphs commonly used as benchmarks coauthor: citation and co-author networks Delaunay: Delaunay triangulations of random points in the plane dyn-frames: frames from a 2D dynamic simulation Kronecker: synthetic graphs from the Graph500 benchmark numerical: graphs from numerical simulation random: random geometric graphs (random points in the unit square) streets: real-world street networks Walshaw: Chris Walshaw's graph partitioning archive matrix: graphs from the UF collection (not added here) redistrict: census networks star-mixtures : artificially generated from sets of real graphs

Some of the graphs already exist in the UF Collection. In some cases, the original graph is unsymmetric, with values, whereas the DIMACS graph is the symmetrized pattern of A+A'. Rather than add duplicate patterns to the UF Collection, a MATLAB script is provided at http://www.cise.ufl.edu/research/sparse/dimacs10 which downloads each matrix from the UF Collection via UFget, and then performs whatever operation is required to convert the matrix to the DIMACS graph problem. Also posted at that page is a MATLAB code (metis_graph) for reading the DIMACS *.graph files into MATLAB.

https://sparse.tamu.edu/DIMACS10
f
UC_vs_US Statistic Analysis.xlsx
figshare.com
xlsx
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.12631628.v1
Dataset updated
Jul 9, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme: Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
o
Data exchange with RO-Crates and Knowledge Graphs
explore.openaire.eu
Updated Jul 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stian Soiland-Reyes; Leyla Jael Castro; Dietrich Rebholz-Schuhmann (2023). Data exchange with RO-Crates and Knowledge Graphs [Dataset]. http://doi.org/10.5281/zenodo.10552449
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.10552449
Dataset updated
Jul 5, 2023
Authors
Stian Soiland-Reyes; Leyla Jael Castro; Dietrich Rebholz-Schuhmann
Description
Data exchange with RO-Crates and Knowledge Graphs Workshop at Open Science Festival 2023, Köln, Germany Citation: Soiland-Reyes, S., Castro, L. J., & Rebholz-Schuhmann, D. (2023, July 5). Data exchange with RO-Crates and Knowledge Graphs. Zenodo. https://doi.org/10.5281/zenodo.10552449 Date: Wednesday 5th July 2023 09:30-12:30 CEST Room: Room John Nash (CECAD Lecture Hall) Agenda (below) Chairs: Stian Soiland-Reyes (RO-Crate, ELIXIR-UK, BY-COVID, FAIR-IMPACT, EOSC-Life, EuroScienceGateway) Leyla Jael Castro (Bioschemas, NFDI4DataScience, ZB MED Information Centre for Life Sciences) Dietrich Rebholz-Schuhmann (Scientific Director – ZB MED) Abstract: Digital Objects (e.g., data, software) and formal knowledge representations (e.g., structured metadata) form the two sides of the Linked Open Science coin: scientists want to exchange their data as Open Science material, possibly embedding it into RO-Crates, while they also want to share their findings and knowledge in a formalised representation, possibly deploying Knowledge Graph representations. What is the meeting point between data distributed via RO-Crates and that one part of a Knowledge Graph? Do RO-Crates and Knowledge Graphs serve different types of data or do they complement each other? What would be the dis/advantages of having Knowledge Graphs in RO-Crates or RO-Crates as nodes in Knowledge Graphs? In this workshop, we will first introduce RO-Crates and then have a round table open discussion to compare the potentials of the different approaches for capturing the data and the knowledge of the scientific world. RO-Crate is a community effort to practically achieve FAIR packaging of research objects (digital objects like data, methods, software) with structured metadata. RO-Crate uses well-established Web standards and FAIR principles. For common metadata representations, RO-Crate builds on schema.org, a mature and general mark-up vocabulary used by search engines including Google Dataset Search. RO-Crate is adapted by many EU/EOSC projects as a pragmatic implementation of the FAIR Digital Objects vision. Agenda Time (CEST) Wed 2023-07-06 09:30 Overview of FAIR data publishing and RO-Crate Speaker: Leyla Jael Castro, Stian Soiland-Reyes Overview of FAIR data publishing and RO-Crate (files included in this record) 09:50 A very brief introduction to making metadata with JSON-LD Speaker: Stian Soiland-Reyes JSON-LD intro (files included in this record) 10:00 Tutorial: FAIRify a dataset using just enough metadata Speaker: Leyla Jael Castro FAIRify datasets with Bioschemas tutorial (files included in this record) 10:30 Tutorial: Packaging a dataset with its metadata as a RO-Crate Speaker: Stian Soiland-Reyes See training material 10:50 Making your own metadata profile Speaker: Stian Soiland-Reyes 11:00 Coffee Break 11:30 Demo: Using Linked Data tooling to query knowledge graphs and challenges Speaker: Stian Soiland-Reyes https://github.com/stain/ro-crate-sparql/blob/main/ro-crate-sparql.ipynb 11:50 Open discussion, feedback and requirements from early adopters Moderator: Dietrich Rebholz-Schuhmann 12:20 Wrap-up and next steps Lead: Stian Soiland-Reyes
KGCW 2024 Challenge @ ESWC 2024
zenodo.org
application/gzip
Updated Mar 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Serles; Umutcan Serles; Ana Iglesias; Ana Iglesias (2024). KGCW 2024 Challenge @ ESWC 2024 [Dataset]. http://doi.org/10.5281/zenodo.10721875
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10721875
Dataset updated
Mar 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Serles; Umutcan Serles; Ana Iglesias; Ana Iglesias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Knowledge Graph Construction Workshop 2024: challenge

Knowledge graph construction of heterogeneous data has seen a lot of uptake
in the last decade from compliance to performance optimizations with respect
to execution time. Besides execution time as a metric for comparing knowledge
graph construction, other metrics e.g. CPU or memory usage are not considered.
This challenge aims at benchmarking systems to find which RDF graph
construction system optimizes for metrics e.g. execution time, CPU,
memory usage, or a combination of these metrics.

Task description

The task is to reduce and report the execution time and computing resources
(CPU and memory usage) for the parameters listed in this challenge, compared
to the state-of-the-art of the existing tools and the baseline results provided
by this challenge. This challenge is not limited to execution times to create
the fastest pipeline, but also computing resources to achieve the most efficient
pipeline.

We provide a tool which can execute such pipelines end-to-end. This tool also
collects and aggregates the metrics such as execution time, CPU and memory
usage, necessary for this challenge as CSV files. Moreover, the information
about the hardware used during the execution of the pipeline is available as
well to allow fairly comparing different pipelines. Your pipeline should consist
of Docker images which can be executed on Linux to run the tool. The tool is
already tested with existing systems, relational databases e.g. MySQL and
PostgreSQL, and triplestores e.g. Apache Jena Fuseki and OpenLink Virtuoso
which can be combined in any configuration. It is strongly encouraged to use
this tool for participating in this challenge. If you prefer to use a different
tool or our tool imposes technical requirements you cannot solve, please contact
us directly.

Track 1: Conformance

The set of new specification for the RDF Mapping Language (RML) established by the W3C Community Group on Knowledge Graph Construction provide a set of test-cases for each module:

RML-Core

RML-IO

RML-CC

RML-FNML

RML-Star

These test-cases are evaluated in this Track of the Challenge to determine their feasibility, correctness, etc. by applying them in implementations. This Track is in Beta status because these new specifications have not seen any implementation yet, thus it may contain bugs and issues. If you find problems with the mappings, output, etc. please report them to the corresponding repository of each module.

Through this Track we aim to spark development of implementations for the new specifications and improve the test-cases. Let us know your problems with the test-cases and we will try to find a solution.

Track 2: Performance

Part 1: Knowledge Graph Construction Parameters

These parameters are evaluated using synthetic generated data to have more
insights of their influence on the pipeline.

Data

Number of data records: scaling the data size vertically by the number of records with a fixed number of data properties (10K, 100K, 1M, 10M records).

Number of data properties: scaling the data size horizontally by the number of data properties with a fixed number of data records (1, 10, 20, 30 columns).

Number of duplicate values: scaling the number of duplicate values in the dataset (0%, 25%, 50%, 75%, 100%).

Number of empty values: scaling the number of empty values in the dataset (0%, 25%, 50%, 75%, 100%).

Number of input files: scaling the number of datasets (1, 5, 10, 15).

Mappings

Number of subjects: scaling the number of subjects with a fixed number of predicates and objects (1, 10, 20, 30 TMs).

Number of predicates and objects: scaling the number of predicates and objects with a fixed number of subjects (1, 10, 20, 30 POMs).

Number of and type of joins: scaling the number of joins and type of joins (1-1, N-1, 1-N, N-M)

Part 2: GTFS-Madrid-Bench

The GTFS-Madrid-Bench provides insights in the pipeline with real data from the
public transport domain in Madrid.

Scaling

GTFS-1 SQL

GTFS-10 SQL

GTFS-100 SQL

GTFS-1000 SQL

Heterogeneity

GTFS-100 XML + JSON

GTFS-100 CSV + XML

GTFS-100 CSV + JSON

GTFS-100 SQL + XML + JSON + CSV

Example pipeline

The ground truth dataset and baseline results are generated in different steps
for each parameter:

The provided CSV files and SQL schema are loaded into a MySQL relational database.

Mappings are executed by accessing the MySQL relational database to construct a knowledge graph in N-Triples as RDF format

The pipeline is executed 5 times from which the median execution time of each
step is calculated and reported. Each step with the median execution time is
then reported in the baseline results with all its measured metrics.
Knowledge graph construction timeout is set to 24 hours.
The execution is performed with the following tool: https://github.com/kg-construct/challenge-tool,
you can adapt the execution plans for this example pipeline to your own needs.

Each parameter has its own directory in the ground truth dataset with the
following files:

Input dataset as CSV.

Mapping file as RML.

Execution plan for the pipeline in metadata.json.

Datasets

Knowledge Graph Construction Parameters

The dataset consists of:

Input dataset as CSV for each parameter.

Mapping file as RML for each parameter.

Baseline results for each parameter with the example pipeline.

Ground truth dataset for each parameter generated with the example pipeline.

Format

All input datasets are provided as CSV, depending on the parameter that is being
evaluated, the number of rows and columns may differ. The first row is always
the header of the CSV.

GTFS-Madrid-Bench

The dataset consists of:

Input dataset as CSV with SQL schema for the scaling and a combination of XML,

CSV, and JSON is provided for the heterogeneity.

Mapping file as RML for both scaling and heterogeneity.

SPARQL queries to retrieve the results.

Baseline results with the example pipeline.

Ground truth dataset generated with the example pipeline.

Format

CSV datasets always have a header as their first row.
JSON and XML datasets have their own schema.

Evaluation criteria

Submissions must evaluate the following metrics:

Execution time of all the steps in the pipeline. The execution time of a step is the difference between the begin and end time of a step.

CPU time as the time spent in the CPU for all steps of the pipeline. The CPU time of a step is the difference between the begin and end CPU time of a step.

Minimal and maximal memory consumption for each step of the pipeline. The minimal and maximal memory consumption of a step is the minimum and maximum calculated of the memory consumption during the execution of a step.

Expected output

Duplicate values

Scale Number of Triples
0 percent 2000000 triples
25 percent 1500020 triples
50 percent 1000020 triples
75 percent 500020 triples
100 percent 20 triples

Empty values

Scale Number of Triples
0 percent 2000000 triples
25 percent 1500000 triples
50 percent 1000000 triples
75 percent 500000 triples
100 percent 0 triples

Mappings

Scale Number of Triples
1TM + 15POM 1500000 triples
3TM + 5POM 1500000 triples
5TM + 3POM 1500000 triples
15TM + 1POM 1500000 triples

Properties

Scale Number of Triples
1M rows 1 column 1000000 triples
1M rows 10 columns 10000000 triples
1M rows 20 columns 20000000 triples
1M rows 30 columns 30000000
T
India Food Inflation
tradingeconomics.com
zh.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Aug 3, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2015). India Food Inflation [Dataset]. https://tradingeconomics.com/india/food-inflation
Explore at:
excel, xml, json, csvAvailable download formats
Dataset updated
Aug 3, 2015
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 31, 2012 - Jun 30, 2025
Area covered
India
Description
Cost of food in India decreased 1.06 percent in June of 2025 over the same month in the previous year. This dataset provides - India Food Inflation - actual values, historical data, forecast, chart, statistics, economic calendar and news.
o
Graph topological features extracted from expression profiles of...
explore.openaire.eu
Updated Aug 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Léon-Charles Tranchevent; Francisco Azuaje; Jagath C Rajapakse (2019). Graph topological features extracted from expression profiles of neuroblastoma patients [Dataset]. http://doi.org/10.5281/zenodo.3357673
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3357673
Dataset updated
Aug 7, 2019
Authors
Léon-Charles Tranchevent; Francisco Azuaje; Jagath C Rajapakse
Description
Introduction This dataset contains the data described in the paper titled "A deep neural network approach to predicting clinical outcomes of neuroblastoma patients." by Tranchevent, Azuaje and Rajapakse. More precisely, this dataset contains the topological features extracted from graphs built from publicly available expression data (see details below). This dataset does not contain the original expression data, which are available elsewhere. We thank the scientists who did generate and share these data (please see below the relevant links and publications). Content File names start with the name of the publicly available dataset they are built on (among "Fischer", "Maris" and "Versteeg"). This name is followed by a tag representing whether they contain raw data ("raw", which means, in this case, the raw topological features) or TF formatted data ("TF", which stands for TensorFlow). This tag is then followed by a unique identifier representing a unique configuration. The configuration file "Global_configuration.tsv" contains details about these configurations such as which topological features are present and which clinical outcome is considered. The code associated to the same manuscript that uses these data is at https://gitlab.com/biomodlih/SingalunDeep. The procedure by which the raw data are transformed into the TensorFlow ready data is described in the paper. File format All files are TSV files that correspond to matrices with samples as rows and features as columns (or clinical data as columns for clinical data files). The data files contain various sets of topological features that were extracted from the sample graphs (or Patient Similarity Networks - PSN). The clinical files contain relevant clinical outcomes. The raw data files only contain the topological data. For instance, the file "Fischer_raw_2d0000_data_tsv" contains 24 values for each sample corresponding to the 12 centralities computed for both the microarray (Fischer-M) and RNA-seq (Fischer-R) datasets. The TensorFlow ready files do not contain the sample identifiers in the first column. However, they contain two extra columns at the end. The first extra column is the sample weights (for the classifiers and because we very often have a dominant class). The second extra column is the class labels (binary), based on the clinical outcome of interest. Dataset details The Fischer dataset is used to train, evaluate and validate the models, so the dataset is split into train / eval / valid files, which contains respectively 249, 125 and 124 rows (samples) of the original 498 samples. In contrast, the other two datasets (Maris and Versteeg) are smaller and are only used for validation (and therefore have no training or evaluation file). The Fischer dataset also has more data files because various configurations were tested (see manuscript). In contrast, the validation, using the Maris and Versteeg datasets is only done for a single configuration and there are therefore less files. For Fischer, a few configurations are listed in the global configuration file but there is no corresponding raw data. This is because these items are derived from concatenations of the original raw data (see global configuration file and manuscript for details). References This dataset is associated with Tranchevent L., Azuaje F.. Rajapakse J.C., A deep neural network approach to predicting clinical outcomes of neuroblastoma patients. If you use these data in your research, please do not forget to also cite the researchers who have generated the original expression datasets. Fischer dataset: Zhang W. et al., Comparison of RNA-seq and microarray-based models for clinical endpoint prediction. Genome Biology 16(1) (2015). doi:10.1186/s13059-015-0694-1 Wang C. et al., The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat. Biotechnol. 32(9), 926–932. doi:10.1038/nbt.3001 Versteeg dataset: Molenaar J.J. et al., Sequencing of neuroblastoma identifies chromothripsis and defects in neuritogenesis genes. Nature 483(7391), 589–593. doi:10.1038/nature10910 Maris dataset: Wang Q. et al., Integrative genomics identifies distinct molecular classes of neuroblastoma and shows that multiple genes are targeted by regional alterations in DNA copy number. Cancer Res. 66(12), 6050–6062. doi:10.1158/0008-5472.CAN-05-4618 Project supported by the Fonds National de la Recherche (FNR), Luxembourg (SINGALUN project). This research was also partially supported by Tier-2 grant MOE2016-T2-1-029 by the Ministry of Education, Singapore.
T
India Inflation Rate
tradingeconomics.com
fa.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). India Inflation Rate [Dataset]. https://tradingeconomics.com/india/inflation-cpi
Explore at:
csv, xml, excel, jsonAvailable download formats
Dataset updated
Jul 14, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 31, 2012 - Jun 30, 2025
Area covered
India
Description
Inflation Rate in India decreased to 2.10 percent in June from 2.82 percent in May of 2025. This dataset provides - India Inflation Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Instagram: most used hashtags 2024
statista.com
es.statista.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department, Instagram: most used hashtags 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
As of January 2024, #love was the most used hashtag on Instagram, being included in over two billion posts on the social media platform. #Instagood and #instagram were used over one billion times as of early 2024.
T
Turkey Inflation Rate
tradingeconomics.com
fa.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). Turkey Inflation Rate [Dataset]. https://tradingeconomics.com/turkey/inflation-cpi
Explore at:
json, excel, xml, csvAvailable download formats
Dataset updated
Jun 3, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 31, 1965 - Jun 30, 2025
Area covered
Turkey
Description
Inflation Rate in Turkey decreased to 35.05 percent in June from 35.41 percent in May of 2025. This dataset provides the latest reported value for - Turkey Inflation Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
T
Vietnamese Dong Data
tradingeconomics.com
fr.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). Vietnamese Dong Data [Dataset]. https://tradingeconomics.com/vietnam/currency
Explore at:
excel, json, xml, csvAvailable download formats
Dataset updated
Jun 15, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Aug 4, 1994 - Jul 31, 2025
Area covered
Vietnam
Description
The USD/VND exchange rate fell to 26,199.0000 on July 31, 2025, down 0.01% from the previous session. Over the past month, the Vietnamese Dong has weakened 0.26%, and is down by 3.90% over the last 12 months. Vietnamese Dong - values, historical data, forecasts and news - updated on July of 2025.
AI-Powered Food Sustainability: Exploiting Knowledge Graphs for Reducing...
zenodo.org
csv, xml
Updated May 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anand Gavai; Anand Gavai (2025). AI-Powered Food Sustainability: Exploiting Knowledge Graphs for Reducing Carbon Footprints and Land Use [Dataset]. http://doi.org/10.5281/zenodo.14916809
Explore at:
csv, xmlAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14916809
Dataset updated
May 5, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anand Gavai; Anand Gavai
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

This README documents the datasets and RDF graph used in the research article "AI-Powered Food Sustainability: Exploiting Knowledge Graphs for Reducing Carbon Footprints and Land Use" by Anand K. Gavai, Suniti Vadalkar, and Mahak Sharma. The study employs AI-driven knowledge graphs to analyze the environmental impacts of food items, focusing on protein sources, and to propose sustainability interventions aligned with the United Nations Sustainable Development Goals (SDGs). The datasets and RDF graph provided here support the construction and querying of the knowledge graph for sustainability analysis.

All data and associated source code are publicly available in a Zenodo repository:
DOI: 10.5281/zenodo.10143973

Data Description

The datasets consist of structured environmental data for various food items, integrated into a knowledge graph to assess sustainability metrics such as carbon footprint, land use, water use, scarcity-weighted water use, and eutrophication. The data primarily focuses on global averages from 2010, sourced from Poore & Nemecek (2018), with an emphasis on protein-rich foods (e.g., beef, cheese, legumes) and other dietary staples.

Sources

The data were sourced from:

Poore & Nemecek (2018): A comprehensive study on the environmental impacts of food production, providing global metrics for greenhouse gas (GHG) emissions, land use, and freshwater withdrawals.

Citation: Poore, J., Nemecek, T., 2018. Reducing food’s environmental impacts through producers and consumers. Science 360, 987–992. DOI: 10.1126/science.aaq0216

OurWorldInData.org: Supplies additional sustainability metrics, including scarcity-weighted water use and eutrophication, complementing the Poore & Nemecek dataset.

File Formats and Contents

The repository includes CSV files and an RDF graph in Turtle format:

CSV Files

Five CSV files provide environmental metrics for 38 food items, focusing on 2010 global averages:

GHG Emissions (Two Files)

Filename: ghg_emissions_per_kg.csv (two identical versions provided)

Columns: Entity (food item), Year (2010), GHG emissions per kilogram (Poore & Nemecek, 2018) (kg CO₂-equivalent per kg)

Example: Beef (dairy herd), 2010, 33.30

Freshwater Withdrawals

Filename: freshwater_withdrawals_per_kg.csv

Columns: Entity (food item), Year (2010), Freshwater withdrawals per kilogram (Poore & Nemecek, 2018) (liters per kg)

Example: Cheese, 2010, 5605.2

Land Use

Filename: land_use_per_kg.csv

Columns: Entity (food item), Year (2010), Land use per kilogram (Poore & Nemecek, 2018) (m² per kg)

Example: Nuts, 2010, 12.96

Comprehensive Metrics (Protein Sources)

Filename: protein_source_metrics.csv

Columns: Food (food item), CarbonFootprint (kg CO₂-eq per kg), LandUse (m² per kg), WaterUse (liters per kg), Scarcity_weighted water use (liters per kg), Eutrophication (g PO₄-eq per kg)

Example: Eggs, 4.67, 6.27, 578, 17983, 21.76

RDF Graph

Filename: food_emissions_graph.ttl

Format: Turtle (TTL)

Description: A knowledge graph representing a subset of food items (e.g., Beef, Cheese, Eggs) with their environmental metrics as properties.

Prefixes:

: <http://example.org/food-emissions#> (custom ontology)

rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

rdfs: <http://www.w3.org/2000/01/rdf-schema#>

Structure:

Nodes: Food items (e.g., :Beef) typed as :FoodItem.

Properties:

:hasCarbonFootprint (kg CO₂-eq per kg)

:hasLandUse (m² per kg)

:hasWaterUse (liters per kg)

:hasScarcityWeightedWaterUse (liters per kg)

:hasEutrophication (g PO₄-eq per kg)

Example:

turtle

WrapCopy

:Beef rdf:type :FoodItem ; :hasCarbonFootprint 33.30 ; :hasLandUse 43.24 ; :hasWaterUse 2714 ; :hasScarcityWeightedWaterUse 119805 ; :hasEutrophication 365.29 .

Key Metrics

The datasets cover the following environmental impact metrics:

Carbon Footprint: GHG emissions in kg CO₂-equivalent per kg of food.

Land Use: Area in square meters (m²) required per kg of food.

Water Use: Freshwater withdrawals in liters per kg of food.

Scarcity-Weighted Water Use: Water use adjusted for regional scarcity, in liters per kg (available for select protein sources).

Eutrophication: Nutrient pollution in grams of phosphate-equivalent (g PO₄-eq) per kg (available for select protein sources).

Usage

These datasets and the RDF graph were used to:

Build an AI-driven knowledge graph for real-time sustainability analysis of food items.

Enable SPARQL queries to rank food items by environmental impact (e.g., identifying low-carbon protein sources like nuts or peas).

Compare traditional protein sources (e.g., beef, cheese) with alternatives (e.g., tofu, soy milk).

Support policy recommendations, such as taxing high-impact foods or promoting sustainable alternatives.

Example applications:

Querying :hasCarbonFootprint to identify that beef (99.48 kg CO₂-eq/kg) far exceeds nuts (0.43 kg CO₂-eq/kg).

Assessing trade-offs, e.g., cheese’s high water use (5605 liters/kg) vs. soy milk’s low water use (27.8 liters/kg).

Access and Availability

All datasets and the RDF graph are available in the Zenodo repository:

URL: https://zenodo.org/records/10143973

DOI: 10.5281/zenodo.10143973

The repository includes:

Raw CSV files (ghg_emissions_per_kg.csv, freshwater_withdrawals_per_kg.csv, land_use_per_kg.csv, protein_source_metrics.csv).

RDF graph file (food_emissions_graph.ttl).

Scripts for data integration and knowledge graph construction (see repository for details).

Limitations

Data Scope: Focuses on 2010 global averages, lacking regional or temporal variations.

Completeness: Scarcity-weighted water use and eutrophication metrics are available only for a subset of protein sources.

Static Nature: Reflects a snapshot from Poore & Nemecek (2018), not real-time data.

RDF Coverage: The provided RDF graph includes only 8 food items; the full graph in the study may cover more.

Funding

This work was supported by the ‘High Tech for a Sustainable Future’ capacity building programme of the 4TU Federation in the Netherlands.

Contact

For questions or further information, please contact the corresponding author:

Name: Anand K. Gavai

Email: anand.gavai@gmail.com

Affiliation: Industrial Engineering & Business Information Systems, University of Twente, Enschede, The Netherlands

Citation

If you use this dataset or RDF graph, please cite the original manuscript:
Gavai, A.K., Vadalkar, S., Sharma, M. (2025). AI-Powered Food Sustainability: Exploiting Knowledge Graphs for Reducing Carbon Footprints and Land Use.

Scale	Number of Triples
0 percent	2000000 triples
25 percent	1500020 triples
50 percent	1000020 triples
75 percent	500020 triples
100 percent	20 triples

Scale	Number of Triples
0 percent	2000000 triples
25 percent	1500000 triples
50 percent	1000000 triples
75 percent	500000 triples
100 percent	0 triples

Scale	Number of Triples
1TM + 15POM	1500000 triples
3TM + 5POM	1500000 triples
5TM + 3POM	1500000 triples
15TM + 1POM	1500000 triples

Scale	Number of Triples
1M rows 1 column	1000000 triples
1M rows 10 columns	10000000 triples
1M rows 20 columns	20000000 triples
1M rows 30 columns	30000000

Facebook

Twitter

Click to copy link

Link copied

Cite

esri_en (2021). Chart Viewer [Dataset]. https://city-of-lawrenceville-arcgis-hub-lville.hub.arcgis.com/items/be4582b38d764de0a970b986c824acde

Chart Viewer

Explore at:

Dataset updated

Sep 22, 2021

Dataset authored and provided by

esri_en

Description

Use the Chart Viewer template to display bar charts, line charts, pie charts, histograms, and scatterplots to complement a map. Include multiple charts to view with a map or side by side with other charts for comparison. Up to three charts can be viewed side by side or stacked, but you can access and view all the charts that are authored in the map. Examples: Present a bar chart representing average property value by county for a given area. Compare charts based on multiple population statistics in your dataset. Display an interactive scatterplot based on two values in your dataset along with an essential set of map exploration tools. Data requirements The Chart Viewer template requires a map with at least one chart configured. Key app capabilities Multiple layout options - Choose Stack to display charts stacked with the map, or choose Side by side to display charts side by side with the map. Manage chart - Reorder, rename, or turn charts on and off in the app. Multiselect chart - Compare two charts in the panel at the same time. Bookmarks - Allow users to zoom and pan to a collection of preset extents that are saved in the map. Home, Zoom controls, Legend, Layer List, Search Supportability This web app is designed responsively to be used in browsers on desktops, mobile phones, and tablets. We are committed to ongoing efforts towards making our apps as accessible as possible. Please feel free to leave a comment on how we can improve the accessibility of our apps for those who use assistive technologies.

Clear search

Close search

Google apps

Main menu

Chart Viewer

NetVotes ENIC Dataset

Comparison of AR.

Data from: Tweeting #RamNavami: A Comparison of Approaches to Analyzing...

Global Health Facts

Eggs US - Price Data

Data from: KGCW 2023 Challenge @ ESWC 2023

Knowledge Graph Construction Workshop 2023: challenge

Myket Android Application Install Dataset

Average daily time spent on social media worldwide 2012-2024

Group DIMACS10 Dataset

UC_vs_US Statistic Analysis.xlsx

Data exchange with RO-Crates and Knowledge Graphs

KGCW 2024 Challenge @ ESWC 2024

Knowledge Graph Construction Workshop 2024: challenge

Track 1: Conformance

Track 2: Performance

India Food Inflation

Graph topological features extracted from expression profiles of...

India Inflation Rate

Instagram: most used hashtags 2024

Turkey Inflation Rate

Vietnamese Dong Data

AI-Powered Food Sustainability: Exploiting Knowledge Graphs for Reducing...

Overview

Data Description

Sources

File Formats and Contents

CSV Files

RDF Graph

Key Metrics

Usage

Access and Availability

Limitations

Funding

Contact

Citation

Chart Viewer