49 datasets found
  1. a

    Chart Viewer

    • city-of-lawrenceville-arcgis-hub-lville.hub.arcgis.com
    Updated Sep 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    esri_en (2021). Chart Viewer [Dataset]. https://city-of-lawrenceville-arcgis-hub-lville.hub.arcgis.com/items/be4582b38d764de0a970b986c824acde
    Explore at:
    Dataset updated
    Sep 22, 2021
    Dataset authored and provided by
    esri_en
    Description

    Use the Chart Viewer template to display bar charts, line charts, pie charts, histograms, and scatterplots to complement a map. Include multiple charts to view with a map or side by side with other charts for comparison. Up to three charts can be viewed side by side or stacked, but you can access and view all the charts that are authored in the map. Examples: Present a bar chart representing average property value by county for a given area. Compare charts based on multiple population statistics in your dataset. Display an interactive scatterplot based on two values in your dataset along with an essential set of map exploration tools. Data requirements The Chart Viewer template requires a map with at least one chart configured. Key app capabilities Multiple layout options - Choose Stack to display charts stacked with the map, or choose Side by side to display charts side by side with the map. Manage chart - Reorder, rename, or turn charts on and off in the app. Multiselect chart - Compare two charts in the panel at the same time. Bookmarks - Allow users to zoom and pan to a collection of preset extents that are saved in the map. Home, Zoom controls, Legend, Layer List, Search Supportability This web app is designed responsively to be used in browsers on desktops, mobile phones, and tablets. We are committed to ongoing efforts towards making our apps as accessible as possible. Please feel free to leave a comment on how we can improve the accessibility of our apps for those who use assistive technologies.

  2. NetVotes ENIC Dataset

    • zenodo.org
    • explore.openaire.eu
    txt, zip
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça (2024). NetVotes ENIC Dataset [Dataset]. http://doi.org/10.5281/zenodo.6815510
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description. The NetVote dataset contains the outputs of the NetVote program when applied to voting data coming from VoteWatch (http://www.votewatch.eu/).

    These results were used in the following conference papers:

    1. I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the European Parliament,” in 2nd European Network Intelligence Conference, 2015, pp. 122–129. ⟨hal-01176090⟩ DOI: 10.1109/ENIC.2015.25
    2. I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Informative Value of Negative Links for Graph Partitioning, with an application to European Parliament Votes,” in 6ème Conférence sur les modèles et lánalyse de réseaux : approches mathématiques et informatiques, 2015, p. 12p. ⟨hal-02055158⟩

    Source code. The NetVote source code is available on GitHub: https://github.com/CompNet/NetVotes.

    Citation. If you use our dataset or tool, please cite article [1] above.


    @InProceedings{Mendonca2015,
    author = {Mendonça, Israel and Figueiredo, Rosa and Labatut, Vincent and Michelon, Philippe},

    title = {Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the {E}uropean {P}arliament},
    booktitle = {2\textsuperscript{nd} European Network Intelligence Conference ({ENIC})},
    year = {2015},
    pages = {122-129},
    address = {Karlskrona, SE},
    publisher = {IEEE Publishing},
    doi = {10.1109/ENIC.2015.25},
    }

    -------------------------

    Details. This archive contains the following folders:

    • `votewatch_data`: the raw data extracted from the VoteWatch website.
      • `VoteWatch Europe European Parliament, Council of the EU.csv`: list of the documents voted during the considered term, with some details such as the date and topic.
      • `votes_by_document`: this folder contains a collection of CSV files, each one describing the outcome of the vote session relatively to one specific document.
      • `intermediate_files`: this folder contains several CSV files:
        • `allvotes.csv`: concatenation of all vote outcomes for all documents and all MEPS. Can be considered as a compact representation of the data contained in the folder `votes_by_document`.
        • `loyalty.csv`: same thing than allvotes.csv, but for the loyalty (i.e. whether or not the MEP voted like the majority of the MEPs in his political group).
        • `MPs.csv`: list of the MEPs having voted at least once in the considered term, with their details.
        • `policies.csv`: list of the topics considered during the term.
        • `qtd_docs.csv`: list of the topics with the corresponding number of documents.
    • `parallel_ils_results`: contains the raw results of the ILS tool. This is an external algorithm able to estimate the optimal partition of the network nodes in terms of structural balance. It was applied to all the networks extracted by our scripts (from the VoteWatch data), and the produced files were placed here for postprocessing. Each subfolder corresponds to one of the topic-year pair.
    • `output_files`: contains the file produced by our scripts.
      • `agreement`: histograms representing the distributions of agreement and rebellion indices. Each subfolder corresponds to a specific topic.
      • `community_algorithms_csv`: Performances obtained by the partitioning algorithms (for both community detection and correlation clustering). Each subfolder corresponds to a specific topic.
      • `xxxx_cluster_information.csv`: table containing several variants of the imbalance measure, for the considered algorithms.
      • `community_algorithms_results`: Comparison of the partitions detected by the various algorithms considered, and distribution of the cluster/community sizes. Each subfolder corresponds to a specific topic.
      • `xxxx_cluster_comparison.csv`: table comparing the partitions detected by the community detection algorithms, in terms of Rand index and other measures.
      • `xxxx_ils_cluster_comparison.csv`: like `xxxx_cluster_comparison.csv`, except we compare the partition of community detection algorithms with that of the ILS.
      • `xxxx_yyyy_distribution.pdf`: histogram of the community (or cluster) sizes detected by algorithm `yyyy`.
      • `graphs`: the networks extracted from the vote data. Each subfolder corresponds to a specific topic.
      • `xxxx_complete_graph.graphml`: network at the Graphml format, with all the information: nodes, edges, nodal attributes (including communities), weights, etc.
      • `xxxx_edges_Gephi.csv`: only the links, with their weights (i.e. vote similarity).
      • `xxxx_graph.g`: network at the g format (for ILS).
      • `xxxx_net_measures.csv`: table containing some stats on the network (number of links, etc.).
      • `xxxx_nodes_Gephi.csv`: list of nodes (i.e. MEPs), with details.
      • `plots`: synthesis plots from the paper.

    -------------------------

    License. These data are shared under a Creative Commons 0 license.

    Contact. Vincent Labatut <vincent.labatut@univ-avignon.fr> & Rosa Figueiredo <rosa.figueiredo@univ-avignon.fr>

  3. f

    Comparison of AR.

    • plos.figshare.com
    xls
    Updated Jul 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuangmei Wang; Fengjie Sun (2024). Comparison of AR. [Dataset]. http://doi.org/10.1371/journal.pone.0302490.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 5, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Shuangmei Wang; Fengjie Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The role of knowledge graph encompasses the representation, organization, retrieval, reasoning, and application of knowledge, providing a rich and robust cognitive foundation for artificial intelligence systems and applications. When we learn new things, find out that some old information was wrong, see changes and progress happening, and adopt new technology standards, we need to update knowledge graphs. However, in some environments, the initial knowledge cannot be known. For example, we cannot have access to the full code of a software, even if we purchased it. In such circumstances, is there a way to update a knowledge graph without prior knowledge? In this paper, We are investigating whether there is a method for this situation within the framework of Dalal revision operators. We first proved that finding the optimal solution in this environment is a strongly NP-complete problem. For this purpose, we proposed two algorithms: Flaccid_search and Tight_search, which have different conditions, and we have proved that both algorithms can find the desired results.

  4. d

    Data from: Tweeting #RamNavami: A Comparison of Approaches to Analyzing...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heaney, Michael (2023). Tweeting #RamNavami: A Comparison of Approaches to Analyzing Bipartite Networks [Dataset]. http://doi.org/10.7910/DVN/HD45EI
    Explore at:
    Dataset updated
    Nov 14, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Heaney, Michael
    Description

    Bipartite networks, also known as two-mode networks or affiliation networks, are a class of networks in which actors or objects are partitioned into two sets, with interactions taking place across but not within sets. These networks are omnipresent in society, encompassing phenomena such as student-teacher interactions, coalition structures, and international treaty participation. With growing data availability and proliferation in statistical estimators and software, scholars have increasingly sought to understand the methods available to model the data generating processes in these networks. This article compares three methods for doing so: (1) Logit; (2) the bipartite Exponential Random Graph Model (ERGM); and (3) the Relational Event Model (REM). This comparison demonstrates the relevance of choices with respect to dependence structures, temporality, parameter specification, and data structure. Considering the example of Ram Navami, a Hindu festival celebrating the birth of Lord Ram, the ego network of tweets using #RamNavami on April 21, 2021 is examined. The results of the analysis illustrate that critical modeling choices make a difference in the estimated parameters and the conclusions to be drawn from them.

  5. d

    Global Health Facts

    • datamed.org
    Updated May 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Global Health Facts [Dataset]. https://datamed.org/display-item.php?repository=0012&idName=ID&id=56d4b851e4b0e644d31324fc
    Explore at:
    Dataset updated
    May 19, 2016
    Description

    Users can customize how data on a number of health indicators are presented, and the resulting tables, charts, and maps can be downloaded. Entire datasets are also available to download. Background Global Health Facts is a Kaiser Family Foundation website that provides global health data on the following topics: HIV/ AIDS; TB; Malaria; Other conditions, diseases and risk indicators; Programs, funding and financing; Health workforce and capacity; Demography and population; Income and the Economy. User Functionality Raw data (by topic) can be downloaded or users can create customized reports, charts, graphs or tables to compare 2 or more countries on different health indicators. Specific profiles for just one country or for one health topic can also be generated. Users can view data as a table, chart or map. Rankings of countries are also available. Data Notes Data sources include UNAIDS, WHO, and the CIA and links to the specific source is provided. Annual data is updated as it comes available. The most recent data is from 2009 (However this varies by exposure), and the site does not specify when new data becomes available.

  6. T

    Eggs US - Price Data

    • tradingeconomics.com
    • de.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). Eggs US - Price Data [Dataset]. https://tradingeconomics.com/commodity/eggs-us
    Explore at:
    excel, csv, xml, jsonAvailable download formats
    Dataset updated
    Jul 30, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 25, 2012 - Jul 30, 2025
    Area covered
    World
    Description

    Eggs US fell to 3.21 USD/Dozen on July 30, 2025, down 3.24% from the previous day. Over the past month, Eggs US's price has risen 25.14%, and is up 17.05% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks the benchmark market for this commodity. This dataset includes a chart with historical data for Eggs US.

  7. Data from: KGCW 2023 Challenge @ ESWC 2023

    • zenodo.org
    • investigacion.usc.gal
    application/gzip
    Updated Apr 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Şimşek; Umutcan Şimşek; Ana Iglesias; Ana Iglesias (2024). KGCW 2023 Challenge @ ESWC 2023 [Dataset]. http://doi.org/10.5281/zenodo.7837289
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Şimşek; Umutcan Şimşek; Ana Iglesias; Ana Iglesias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Knowledge Graph Construction Workshop 2023: challenge

    Knowledge graph construction of heterogeneous data has seen a lot of uptake
    in the last decade from compliance to performance optimizations with respect
    to execution time. Besides execution time as a metric for comparing knowledge
    graph construction, other metrics e.g. CPU or memory usage are not considered.
    This challenge aims at benchmarking systems to find which RDF graph
    construction system optimizes for metrics e.g. execution time, CPU,
    memory usage, or a combination of these metrics.

    Task description

    The task is to reduce and report the execution time and computing resources
    (CPU and memory usage) for the parameters listed in this challenge, compared
    to the state-of-the-art of the existing tools and the baseline results provided
    by this challenge. This challenge is not limited to execution times to create
    the fastest pipeline, but also computing resources to achieve the most efficient
    pipeline.

    We provide a tool which can execute such pipelines end-to-end. This tool also
    collects and aggregates the metrics such as execution time, CPU and memory
    usage, necessary for this challenge as CSV files. Moreover, the information
    about the hardware used during the execution of the pipeline is available as
    well to allow fairly comparing different pipelines. Your pipeline should consist
    of Docker images which can be executed on Linux to run the tool. The tool is
    already tested with existing systems, relational databases e.g. MySQL and
    PostgreSQL, and triplestores e.g. Apache Jena Fuseki and OpenLink Virtuoso
    which can be combined in any configuration. It is strongly encouraged to use
    this tool for participating in this challenge. If you prefer to use a different
    tool or our tool imposes technical requirements you cannot solve, please contact
    us directly.

    Part 1: Knowledge Graph Construction Parameters

    These parameters are evaluated using synthetic generated data to have more
    insights of their influence on the pipeline.

    Data

    • Number of data records: scaling the data size vertically by the number of records with a fixed number of data properties (10K, 100K, 1M, 10M records).
    • Number of data properties: scaling the data size horizontally by the number of data properties with a fixed number of data records (1, 10, 20, 30 columns).
    • Number of duplicate values: scaling the number of duplicate values in the dataset (0%, 25%, 50%, 75%, 100%).
    • Number of empty values: scaling the number of empty values in the dataset (0%, 25%, 50%, 75%, 100%).
    • Number of input files: scaling the number of datasets (1, 5, 10, 15).

    Mappings

    • Number of subjects: scaling the number of subjects with a fixed number of predicates and objects (1, 10, 20, 30 TMs).
    • Number of predicates and objects: scaling the number of predicates and objects with a fixed number of subjects (1, 10, 20, 30 POMs).
    • Number of and type of joins: scaling the number of joins and type of joins (1-1, N-1, 1-N, N-M)

    Part 2: GTFS-Madrid-Bench

    The GTFS-Madrid-Bench provides insights in the pipeline with real data from the
    public transport domain in Madrid.

    Scaling

    • GTFS-1 SQL
    • GTFS-10 SQL
    • GTFS-100 SQL
    • GTFS-1000 SQL

    Heterogeneity

    • GTFS-100 XML + JSON
    • GTFS-100 CSV + XML
    • GTFS-100 CSV + JSON
    • GTFS-100 SQL + XML + JSON + CSV

    Example pipeline

    The ground truth dataset and baseline results are generated in different steps
    for each parameter:

    1. The provided CSV files and SQL schema are loaded into a MySQL relational database.
    2. Mappings are executed by accessing the MySQL relational database to construct a knowledge graph in N-Triples as RDF format.
    3. The constructed knowledge graph is loaded into a Virtuoso triplestore, tuned according to the Virtuoso documentation.
    4. The provided SPARQL queries are executed on the SPARQL endpoint exposed by Virtuoso.

    The pipeline is executed 5 times from which the median execution time of each
    step is calculated and reported. Each step with the median execution time is
    then reported in the baseline results with all its measured metrics.
    Query timeout is set to 1 hour and knowledge graph construction timeout
    to 24 hours. The execution is performed with the following tool: https://github.com/kg-construct/challenge-tool,
    you can adapt the execution plans for this example pipeline to your own needs.

    Each parameter has its own directory in the ground truth dataset with the
    following files:

    • Input dataset as CSV.
    • Mapping file as RML.
    • Queries as SPARQL.
    • Execution plan for the pipeline in metadata.json.

    Datasets

    Knowledge Graph Construction Parameters

    The dataset consists of:

    • Input dataset as CSV for each parameter.
    • Mapping file as RML for each parameter.
    • SPARQL queries to retrieve the results for each parameter.
    • Baseline results for each parameter with the example pipeline.
    • Ground truth dataset for each parameter generated with the example pipeline.

    Format

    All input datasets are provided as CSV, depending on the parameter that is being
    evaluated, the number of rows and columns may differ. The first row is always
    the header of the CSV.

    GTFS-Madrid-Bench

    The dataset consists of:

    • Input dataset as CSV with SQL schema for the scaling and a combination of XML,
    • CSV, and JSON is provided for the heterogeneity.
    • Mapping file as RML for both scaling and heterogeneity.
    • SPARQL queries to retrieve the results.
    • Baseline results with the example pipeline.
    • Ground truth dataset generated with the example pipeline.

    Format

    CSV datasets always have a header as their first row.
    JSON and XML datasets have their own schema.

    Evaluation criteria

    Submissions must evaluate the following metrics:

    • Execution time of all the steps in the pipeline. The execution time of a step is the difference between the begin and end time of a step.
    • CPU time as the time spent in the CPU for all steps of the pipeline. The CPU time of a step is the difference between the begin and end CPU time of a step.
    • Minimal and maximal memory consumption for each step of the pipeline. The minimal and maximal memory consumption of a step is the minimum and maximum calculated of the memory consumption during the execution of a step.

    Expected output

    Duplicate values

    ScaleNumber of Triples
    0 percent2000000 triples
    25 percent1500020 triples
    50 percent1000020 triples
    75 percent500020 triples
    100 percent20 triples

    Empty values

    ScaleNumber of Triples
    0 percent2000000 triples
    25 percent1500000 triples
    50 percent1000000 triples
    75 percent500000 triples
    100 percent0 triples

    Mappings

    ScaleNumber of Triples
    1TM + 15POM1500000 triples
    3TM + 5POM1500000 triples
    5TM + 3POM 1500000 triples
    15TM + 1POM1500000 triples

    Properties

    ScaleNumber of Triples
    1M rows 1 column1000000 triples
    1M rows 10 columns10000000 triples
    1M rows 20 columns20000000 triples
    1M rows 30 columns30000000 triples

    Records

    ScaleNumber of Triples
    10K rows 20 columns200000 triples
    100K rows 20 columns2000000 triples
    1M rows 20 columns20000000 triples
    10M rows 20 columns200000000 triples

    Joins

    1-1 joins

    ScaleNumber of Triples
    0 percent0 triples
    25 percent125000 triples
    50 percent250000 triples
    75 percent375000 triples
    100 percent500000 triples

    1-N joins

    ScaleNumber of Triples
    1-10 0 percent0 triples
    1-10 25 percent125000 triples
    1-10 50 percent250000 triples
    1-10 75 percent375000

  8. Myket Android Application Install Dataset

    • zenodo.org
    bin, csv
    Updated Aug 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erfan Loghmani; MohammadAmin Fazli; Erfan Loghmani; MohammadAmin Fazli (2023). Myket Android Application Install Dataset [Dataset]. http://doi.org/10.48550/arxiv.2308.06862
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Aug 23, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Erfan Loghmani; MohammadAmin Fazli; Erfan Loghmani; MohammadAmin Fazli
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains information on application install interactions of users in the Myket android application market. The dataset was created for the purpose of evaluating interaction prediction models, requiring user and item identifiers along with timestamps of the interactions. Hence, the dataset can be used for interaction prediction and building a recommendation system. Furthermore, the data forms a dynamic network of interactions, and we can also perform network representation learning on the nodes in the network, which are users and applications.

    Data Creation

    The dataset was initially generated by the Myket data team, and later cleaned and subsampled by Erfan Loghmani a master student at Sharif University of Technology at the time. The data team focused on a two-week period and randomly sampled 1/3 of the users with interactions during that period. They then selected install and update interactions for three months before and after the two-week period, resulting in interactions spanning about 6 months and two weeks.

    We further subsampled and cleaned the data to focus on application download interactions. We identified the top 8000 most installed applications and selected interactions related to them. We retained users with more than 32 interactions, resulting in 280,391 users. From this group, we randomly selected 10,000 users, and the data was filtered to include only interactions for these users. The detailed procedure can be found in here.

    Data Structure

    The dataset has two main files.

    • myket.csv: This file contains the interaction information and follows the same format as the datasets used in the "JODIE: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks" (ACM SIGKDD 2019) project. However, this data does not contain state labels and interaction features, resulting in associated columns being all zero.
    • app_info_sample.csv: This file comprises features associated with applications present in the sample. For each individual application, information such as the approximate number of installs, average rating, count of ratings, and category are included. These features provide insights into the applications present in the dataset.

    Dataset Details

    • Total Instances: 694,121 install interaction instances
    • Instances Format: Triplets of user_id, app_name, timestamp
    • 10,000 users and 7,988 android applications
    • Item features for 7,606 applications

    For a detailed summary of the data's statistics, including information on users, applications, and interactions, please refer to the Python notebook available at summary-stats.ipynb. The notebook provides an overview of the dataset's characteristics and can be helpful for understanding the data's structure before using it for research or analysis.

    Top 20 Most Installed Applications

    Package NameCount of Interactions
    com.instagram.android15292
    ir.resaneh1.iptv12143
    com.tencent.ig7919
    com.ForgeGames.SpecialForcesGroup27797
    ir.nomogame.ClutchGame6193
    com.dts.freefireth6041
    com.whatsapp5876
    com.supercell.clashofclans5817
    com.mojang.minecraftpe5649
    com.lenovo.anyshare.gps5076
    ir.medu.shad4673
    com.firsttouchgames.dls34641
    com.activision.callofduty.shooter4357
    com.tencent.iglite4126
    com.aparat3598
    com.kiloo.subwaysurf3135
    com.supercell.clashroyale2793
    co.palang.QuizOfKings2589
    com.nazdika.app2436
    com.digikala2413

    Comparison with SNAP Datasets

    The Myket dataset introduced in this repository exhibits distinct characteristics compared to the real-world datasets used by the project. The table below provides a comparative overview of the key dataset characteristics:

    Dataset#Users#Items#InteractionsAverage Interactions per UserAverage Unique Items per User
    Myket10,0007,988694,12169.454.6
    LastFM9801,0001,293,1031,319.5158.2
    Reddit10,000984672,44767.27.9
    Wikipedia8,2271,000157,47419.12.2
    MOOC7,04797411,74958.425.3

    The Myket dataset stands out by having an ample number of both users and items, highlighting its relevance for real-world, large-scale applications. Unlike LastFM, Reddit, and Wikipedia datasets, where users exhibit repetitive item interactions, the Myket dataset contains a comparatively lower amount of repetitive interactions. This unique characteristic reflects the diverse nature of user behaviors in the Android application market environment.

    Citation

    If you use this dataset in your research, please cite the following preprint:

    @misc{loghmani2023effect,
       title={Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks}, 
       author={Erfan Loghmani and MohammadAmin Fazli},
       year={2023},
       eprint={2308.06862},
       archivePrefix={arXiv},
       primaryClass={cs.LG}
    }
    
  9. Average daily time spent on social media worldwide 2012-2024

    • statista.com
    • es.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Average daily time spent on social media worldwide 2012-2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    How much time do people spend on social media?

                  As of 2024, the average daily social media usage of internet users worldwide amounted to 143 minutes per day, down from 151 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of three hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in
                  the U.S. was just two hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively.
                  People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general.
                  During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
    
  10. P

    Group DIMACS10 Dataset

    • paperswithcode.com
    Updated Jul 15, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2012). Group DIMACS10 Dataset [Dataset]. https://paperswithcode.com/dataset/group-dimacs10-law-suitesparse-matrix
    Explore at:
    Dataset updated
    Jul 15, 2012
    Description

    10th DIMACS Implementation Challenge

    Updated July 2012

    http://www.cc.gatech.edu/dimacs10/index.shtml http://www.cise.ufl.edu/research/sparse/dimacs10

    As stated on their main website ( http://dimacs.rutgers.edu/Challenges/ ), the "DIMACS Implementation Challenges address questions of determining realistic algorithm performance where worst case analysis is overly pessimistic and probabilistic models are too unrealistic: experimentation can provide guides to realistic algorithm performance where analysis fails."

    For the 10th DIMACS Implementation Challenge, the two related problems of graph partitioning and graph clustering were chosen. Graph partitioning and graph clustering are among the aforementioned questions or problem areas where theoretical and practical results deviate significantly from each other, so that experimental outcomes are of particular interest.

    Problem Motivation

    Graph partitioning and graph clustering are ubiquitous subtasks in many application areas. Generally speaking, both techniques aim at the identification of vertex subsets with many internal and few external edges. To name only a few, problems addressed by graph partitioning and graph clustering algorithms are:

    • What are the communities within an (online) social network?
    • How do I speed up a numerical simulation by mapping it efficiently onto a parallel computer?
    • How must components be organized on a computer chip such that they can communicate efficiently with each other?
    • What are the segments of a digital image?
    • Which functions are certain genes (most likely) responsible for?

    Challenge Goals

    • One goal of this Challenge is to create a reproducible picture of the state-of-the-art in the area of graph partitioning (GP) and graph clustering (GC) algorithms. To this end we are identifying a standard set of benchmark instances and generators.

    • Moreover, after initiating a discussion with the community, we would like to establish the most appropriate problem formulations and objective functions for a variety of applications.

    • Another goal is to enable current researchers to compare their codes with each other, in hopes of identifying the most effective algorithmic innovations that have been proposed.

    • The final goal is to publish proceedings containing results presented at the Challenge workshop, and a book containing the best of the proceedings papers.

    Problems Addressed

    The precise problem formulations need to be established in the course of the Challenge. The descriptions below serve as a starting point.

    • Graph partitioning:

      The most common formulation of the graph partitioning problem for an undirected graph G = (V,E) asks for a division of V into k pairwise disjoint subsets (partitions) such that all partitions are of approximately equal size and the edge-cut, i.e., the total number of edges having their incident nodes in different subdomains, is minimized. The problem is known to be NP-hard.

    • Graph clustering:

      Clustering is an important tool for investigating the structural properties of data. Generally speaking, clustering refers to the grouping of objects such that objects in the same cluster are more similar to each other than to objects of different clusters. The similarity measure depends on the underlying application. Clustering graphs usually refers to the identification of vertex subsets (clusters) that have significantly more internal edges (to vertices of the same cluster) than external ones (to vertices of another cluster).

    There are 12 data sets in the DIMACS10 collection:

    clustering: real-world graphs commonly used as benchmarks coauthor: citation and co-author networks Delaunay: Delaunay triangulations of random points in the plane dyn-frames: frames from a 2D dynamic simulation Kronecker: synthetic graphs from the Graph500 benchmark numerical: graphs from numerical simulation random: random geometric graphs (random points in the unit square) streets: real-world street networks Walshaw: Chris Walshaw's graph partitioning archive matrix: graphs from the UF collection (not added here) redistrict: census networks star-mixtures : artificially generated from sets of real graphs

    Some of the graphs already exist in the UF Collection. In some cases, the original graph is unsymmetric, with values, whereas the DIMACS graph is the symmetrized pattern of A+A'. Rather than add duplicate patterns to the UF Collection, a MATLAB script is provided at http://www.cise.ufl.edu/research/sparse/dimacs10 which downloads each matrix from the UF Collection via UFget, and then performs whatever operation is required to convert the matrix to the DIMACS graph problem. Also posted at that page is a MATLAB code (metis_graph) for reading the DIMACS *.graph files into MATLAB.

    https://sparse.tamu.edu/DIMACS10

  11. f

    UC_vs_US Statistic Analysis.xlsx

    • figshare.com
    xlsx
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Utrecht University
    Authors
    F. (Fabiano) Dalpiaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

    Tagging scheme:
    Aligned (AL) - A concept is represented as a class in both models, either
    

    with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

    All the calculations and information provided in the following sheets
    

    originate from that raw data.

    Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
    

    including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

    Sheet 3 (Size-Ratio):
    

    The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

    Sheet 4 (Overall):
    

    Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

    For sheet 4 as well as for the following four sheets, diverging stacked bar
    

    charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

    Sheet 5 (By-Notation):
    

    Model correctness and model completeness is compared by notation - UC, US.

    Sheet 6 (By-Case):
    

    Model correctness and model completeness is compared by case - SIM, HOS, IFA.

    Sheet 7 (By-Process):
    

    Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

    Sheet 8 (By-Grade):
    

    Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.

  12. o

    Data exchange with RO-Crates and Knowledge Graphs

    • explore.openaire.eu
    Updated Jul 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stian Soiland-Reyes; Leyla Jael Castro; Dietrich Rebholz-Schuhmann (2023). Data exchange with RO-Crates and Knowledge Graphs [Dataset]. http://doi.org/10.5281/zenodo.10552449
    Explore at:
    Dataset updated
    Jul 5, 2023
    Authors
    Stian Soiland-Reyes; Leyla Jael Castro; Dietrich Rebholz-Schuhmann
    Description

    Data exchange with RO-Crates and Knowledge Graphs Workshop at Open Science Festival 2023, Köln, Germany Citation: Soiland-Reyes, S., Castro, L. J., & Rebholz-Schuhmann, D. (2023, July 5). Data exchange with RO-Crates and Knowledge Graphs. Zenodo. https://doi.org/10.5281/zenodo.10552449 Date: Wednesday 5th July 2023 09:30-12:30 CEST Room: Room John Nash (CECAD Lecture Hall) Agenda (below) Chairs: Stian Soiland-Reyes (RO-Crate, ELIXIR-UK, BY-COVID, FAIR-IMPACT, EOSC-Life, EuroScienceGateway) Leyla Jael Castro (Bioschemas, NFDI4DataScience, ZB MED Information Centre for Life Sciences) Dietrich Rebholz-Schuhmann (Scientific Director – ZB MED) Abstract: Digital Objects (e.g., data, software) and formal knowledge representations (e.g., structured metadata) form the two sides of the Linked Open Science coin: scientists want to exchange their data as Open Science material, possibly embedding it into RO-Crates, while they also want to share their findings and knowledge in a formalised representation, possibly deploying Knowledge Graph representations. What is the meeting point between data distributed via RO-Crates and that one part of a Knowledge Graph? Do RO-Crates and Knowledge Graphs serve different types of data or do they complement each other? What would be the dis/advantages of having Knowledge Graphs in RO-Crates or RO-Crates as nodes in Knowledge Graphs? In this workshop, we will first introduce RO-Crates and then have a round table open discussion to compare the potentials of the different approaches for capturing the data and the knowledge of the scientific world. RO-Crate is a community effort to practically achieve FAIR packaging of research objects (digital objects like data, methods, software) with structured metadata. RO-Crate uses well-established Web standards and FAIR principles. For common metadata representations, RO-Crate builds on schema.org, a mature and general mark-up vocabulary used by search engines including Google Dataset Search. RO-Crate is adapted by many EU/EOSC projects as a pragmatic implementation of the FAIR Digital Objects vision. Agenda Time (CEST) Wed 2023-07-06 09:30 Overview of FAIR data publishing and RO-Crate Speaker: Leyla Jael Castro, Stian Soiland-Reyes Overview of FAIR data publishing and RO-Crate (files included in this record) 09:50 A very brief introduction to making metadata with JSON-LD Speaker: Stian Soiland-Reyes JSON-LD intro (files included in this record) 10:00 Tutorial: FAIRify a dataset using just enough metadata Speaker: Leyla Jael Castro FAIRify datasets with Bioschemas tutorial (files included in this record) 10:30 Tutorial: Packaging a dataset with its metadata as a RO-Crate Speaker: Stian Soiland-Reyes See training material 10:50 Making your own metadata profile Speaker: Stian Soiland-Reyes 11:00 Coffee Break 11:30 Demo: Using Linked Data tooling to query knowledge graphs and challenges Speaker: Stian Soiland-Reyes https://github.com/stain/ro-crate-sparql/blob/main/ro-crate-sparql.ipynb 11:50 Open discussion, feedback and requirements from early adopters Moderator: Dietrich Rebholz-Schuhmann 12:20 Wrap-up and next steps Lead: Stian Soiland-Reyes

  13. KGCW 2024 Challenge @ ESWC 2024

    • zenodo.org
    application/gzip
    Updated Mar 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Serles; Umutcan Serles; Ana Iglesias; Ana Iglesias (2024). KGCW 2024 Challenge @ ESWC 2024 [Dataset]. http://doi.org/10.5281/zenodo.10721875
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Serles; Umutcan Serles; Ana Iglesias; Ana Iglesias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Knowledge Graph Construction Workshop 2024: challenge

    Knowledge graph construction of heterogeneous data has seen a lot of uptake
    in the last decade from compliance to performance optimizations with respect
    to execution time. Besides execution time as a metric for comparing knowledge
    graph construction, other metrics e.g. CPU or memory usage are not considered.
    This challenge aims at benchmarking systems to find which RDF graph
    construction system optimizes for metrics e.g. execution time, CPU,
    memory usage, or a combination of these metrics.

    Task description

    The task is to reduce and report the execution time and computing resources
    (CPU and memory usage) for the parameters listed in this challenge, compared
    to the state-of-the-art of the existing tools and the baseline results provided
    by this challenge. This challenge is not limited to execution times to create
    the fastest pipeline, but also computing resources to achieve the most efficient
    pipeline.

    We provide a tool which can execute such pipelines end-to-end. This tool also
    collects and aggregates the metrics such as execution time, CPU and memory
    usage, necessary for this challenge as CSV files. Moreover, the information
    about the hardware used during the execution of the pipeline is available as
    well to allow fairly comparing different pipelines. Your pipeline should consist
    of Docker images which can be executed on Linux to run the tool. The tool is
    already tested with existing systems, relational databases e.g. MySQL and
    PostgreSQL, and triplestores e.g. Apache Jena Fuseki and OpenLink Virtuoso
    which can be combined in any configuration. It is strongly encouraged to use
    this tool for participating in this challenge. If you prefer to use a different
    tool or our tool imposes technical requirements you cannot solve, please contact
    us directly.

    Track 1: Conformance

    The set of new specification for the RDF Mapping Language (RML) established by the W3C Community Group on Knowledge Graph Construction provide a set of test-cases for each module:

    These test-cases are evaluated in this Track of the Challenge to determine their feasibility, correctness, etc. by applying them in implementations. This Track is in Beta status because these new specifications have not seen any implementation yet, thus it may contain bugs and issues. If you find problems with the mappings, output, etc. please report them to the corresponding repository of each module.

    Through this Track we aim to spark development of implementations for the new specifications and improve the test-cases. Let us know your problems with the test-cases and we will try to find a solution.

    Track 2: Performance

    Part 1: Knowledge Graph Construction Parameters

    These parameters are evaluated using synthetic generated data to have more
    insights of their influence on the pipeline.

    Data

    • Number of data records: scaling the data size vertically by the number of records with a fixed number of data properties (10K, 100K, 1M, 10M records).
    • Number of data properties: scaling the data size horizontally by the number of data properties with a fixed number of data records (1, 10, 20, 30 columns).
    • Number of duplicate values: scaling the number of duplicate values in the dataset (0%, 25%, 50%, 75%, 100%).
    • Number of empty values: scaling the number of empty values in the dataset (0%, 25%, 50%, 75%, 100%).
    • Number of input files: scaling the number of datasets (1, 5, 10, 15).

    Mappings

    • Number of subjects: scaling the number of subjects with a fixed number of predicates and objects (1, 10, 20, 30 TMs).
    • Number of predicates and objects: scaling the number of predicates and objects with a fixed number of subjects (1, 10, 20, 30 POMs).
    • Number of and type of joins: scaling the number of joins and type of joins (1-1, N-1, 1-N, N-M)

    Part 2: GTFS-Madrid-Bench

    The GTFS-Madrid-Bench provides insights in the pipeline with real data from the
    public transport domain in Madrid.

    Scaling

    • GTFS-1 SQL
    • GTFS-10 SQL
    • GTFS-100 SQL
    • GTFS-1000 SQL

    Heterogeneity

    • GTFS-100 XML + JSON
    • GTFS-100 CSV + XML
    • GTFS-100 CSV + JSON
    • GTFS-100 SQL + XML + JSON + CSV

    Example pipeline

    The ground truth dataset and baseline results are generated in different steps
    for each parameter:

    1. The provided CSV files and SQL schema are loaded into a MySQL relational database.
    2. Mappings are executed by accessing the MySQL relational database to construct a knowledge graph in N-Triples as RDF format

    The pipeline is executed 5 times from which the median execution time of each
    step is calculated and reported. Each step with the median execution time is
    then reported in the baseline results with all its measured metrics.
    Knowledge graph construction timeout is set to 24 hours.
    The execution is performed with the following tool: https://github.com/kg-construct/challenge-tool,
    you can adapt the execution plans for this example pipeline to your own needs.

    Each parameter has its own directory in the ground truth dataset with the
    following files:

    • Input dataset as CSV.
    • Mapping file as RML.
    • Execution plan for the pipeline in metadata.json.

    Datasets

    Knowledge Graph Construction Parameters

    The dataset consists of:

    • Input dataset as CSV for each parameter.
    • Mapping file as RML for each parameter.
    • Baseline results for each parameter with the example pipeline.
    • Ground truth dataset for each parameter generated with the example pipeline.

    Format

    All input datasets are provided as CSV, depending on the parameter that is being
    evaluated, the number of rows and columns may differ. The first row is always
    the header of the CSV.

    GTFS-Madrid-Bench

    The dataset consists of:

    • Input dataset as CSV with SQL schema for the scaling and a combination of XML,
    • CSV, and JSON is provided for the heterogeneity.
    • Mapping file as RML for both scaling and heterogeneity.
    • SPARQL queries to retrieve the results.
    • Baseline results with the example pipeline.
    • Ground truth dataset generated with the example pipeline.

    Format

    CSV datasets always have a header as their first row.
    JSON and XML datasets have their own schema.

    Evaluation criteria

    Submissions must evaluate the following metrics:

    • Execution time of all the steps in the pipeline. The execution time of a step is the difference between the begin and end time of a step.
    • CPU time as the time spent in the CPU for all steps of the pipeline. The CPU time of a step is the difference between the begin and end CPU time of a step.
    • Minimal and maximal memory consumption for each step of the pipeline. The minimal and maximal memory consumption of a step is the minimum and maximum calculated of the memory consumption during the execution of a step.

    Expected output

    Duplicate values

    ScaleNumber of Triples
    0 percent2000000 triples
    25 percent1500020 triples
    50 percent1000020 triples
    75 percent500020 triples
    100 percent20 triples

    Empty values

    ScaleNumber of Triples
    0 percent2000000 triples
    25 percent1500000 triples
    50 percent1000000 triples
    75 percent500000 triples
    100 percent0 triples

    Mappings

    ScaleNumber of Triples
    1TM + 15POM1500000 triples
    3TM + 5POM1500000 triples
    5TM + 3POM 1500000 triples
    15TM + 1POM1500000 triples

    Properties

    ScaleNumber of Triples
    1M rows 1 column1000000 triples
    1M rows 10 columns10000000 triples
    1M rows 20 columns20000000 triples
    1M rows 30 columns30000000

  14. T

    India Food Inflation

    • tradingeconomics.com
    • zh.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Aug 3, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2015). India Food Inflation [Dataset]. https://tradingeconomics.com/india/food-inflation
    Explore at:
    excel, xml, json, csvAvailable download formats
    Dataset updated
    Aug 3, 2015
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 31, 2012 - Jun 30, 2025
    Area covered
    India
    Description

    Cost of food in India decreased 1.06 percent in June of 2025 over the same month in the previous year. This dataset provides - India Food Inflation - actual values, historical data, forecast, chart, statistics, economic calendar and news.

  15. o

    Graph topological features extracted from expression profiles of...

    • explore.openaire.eu
    Updated Aug 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Léon-Charles Tranchevent; Francisco Azuaje; Jagath C Rajapakse (2019). Graph topological features extracted from expression profiles of neuroblastoma patients [Dataset]. http://doi.org/10.5281/zenodo.3357673
    Explore at:
    Dataset updated
    Aug 7, 2019
    Authors
    Léon-Charles Tranchevent; Francisco Azuaje; Jagath C Rajapakse
    Description

    Introduction This dataset contains the data described in the paper titled "A deep neural network approach to predicting clinical outcomes of neuroblastoma patients." by Tranchevent, Azuaje and Rajapakse. More precisely, this dataset contains the topological features extracted from graphs built from publicly available expression data (see details below). This dataset does not contain the original expression data, which are available elsewhere. We thank the scientists who did generate and share these data (please see below the relevant links and publications). Content File names start with the name of the publicly available dataset they are built on (among "Fischer", "Maris" and "Versteeg"). This name is followed by a tag representing whether they contain raw data ("raw", which means, in this case, the raw topological features) or TF formatted data ("TF", which stands for TensorFlow). This tag is then followed by a unique identifier representing a unique configuration. The configuration file "Global_configuration.tsv" contains details about these configurations such as which topological features are present and which clinical outcome is considered. The code associated to the same manuscript that uses these data is at https://gitlab.com/biomodlih/SingalunDeep. The procedure by which the raw data are transformed into the TensorFlow ready data is described in the paper. File format All files are TSV files that correspond to matrices with samples as rows and features as columns (or clinical data as columns for clinical data files). The data files contain various sets of topological features that were extracted from the sample graphs (or Patient Similarity Networks - PSN). The clinical files contain relevant clinical outcomes. The raw data files only contain the topological data. For instance, the file "Fischer_raw_2d0000_data_tsv" contains 24 values for each sample corresponding to the 12 centralities computed for both the microarray (Fischer-M) and RNA-seq (Fischer-R) datasets. The TensorFlow ready files do not contain the sample identifiers in the first column. However, they contain two extra columns at the end. The first extra column is the sample weights (for the classifiers and because we very often have a dominant class). The second extra column is the class labels (binary), based on the clinical outcome of interest. Dataset details The Fischer dataset is used to train, evaluate and validate the models, so the dataset is split into train / eval / valid files, which contains respectively 249, 125 and 124 rows (samples) of the original 498 samples. In contrast, the other two datasets (Maris and Versteeg) are smaller and are only used for validation (and therefore have no training or evaluation file). The Fischer dataset also has more data files because various configurations were tested (see manuscript). In contrast, the validation, using the Maris and Versteeg datasets is only done for a single configuration and there are therefore less files. For Fischer, a few configurations are listed in the global configuration file but there is no corresponding raw data. This is because these items are derived from concatenations of the original raw data (see global configuration file and manuscript for details). References This dataset is associated with Tranchevent L., Azuaje F.. Rajapakse J.C., A deep neural network approach to predicting clinical outcomes of neuroblastoma patients. If you use these data in your research, please do not forget to also cite the researchers who have generated the original expression datasets. Fischer dataset: Zhang W. et al., Comparison of RNA-seq and microarray-based models for clinical endpoint prediction. Genome Biology 16(1) (2015). doi:10.1186/s13059-015-0694-1 Wang C. et al., The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat. Biotechnol. 32(9), 926–932. doi:10.1038/nbt.3001 Versteeg dataset: Molenaar J.J. et al., Sequencing of neuroblastoma identifies chromothripsis and defects in neuritogenesis genes. Nature 483(7391), 589–593. doi:10.1038/nature10910 Maris dataset: Wang Q. et al., Integrative genomics identifies distinct molecular classes of neuroblastoma and shows that multiple genes are targeted by regional alterations in DNA copy number. Cancer Res. 66(12), 6050–6062. doi:10.1158/0008-5472.CAN-05-4618 Project supported by the Fonds National de la Recherche (FNR), Luxembourg (SINGALUN project). This research was also partially supported by Tier-2 grant MOE2016-T2-1-029 by the Ministry of Education, Singapore.

  16. T

    India Inflation Rate

    • tradingeconomics.com
    • fa.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). India Inflation Rate [Dataset]. https://tradingeconomics.com/india/inflation-cpi
    Explore at:
    csv, xml, excel, jsonAvailable download formats
    Dataset updated
    Jul 14, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 31, 2012 - Jun 30, 2025
    Area covered
    India
    Description

    Inflation Rate in India decreased to 2.10 percent in June from 2.82 percent in May of 2025. This dataset provides - India Inflation Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.

  17. Instagram: most used hashtags 2024

    • statista.com
    • es.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department, Instagram: most used hashtags 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    As of January 2024, #love was the most used hashtag on Instagram, being included in over two billion posts on the social media platform. #Instagood and #instagram were used over one billion times as of early 2024.

  18. T

    Turkey Inflation Rate

    • tradingeconomics.com
    • fa.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). Turkey Inflation Rate [Dataset]. https://tradingeconomics.com/turkey/inflation-cpi
    Explore at:
    json, excel, xml, csvAvailable download formats
    Dataset updated
    Jun 3, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 31, 1965 - Jun 30, 2025
    Area covered
    Turkey
    Description

    Inflation Rate in Turkey decreased to 35.05 percent in June from 35.41 percent in May of 2025. This dataset provides the latest reported value for - Turkey Inflation Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.

  19. T

    Vietnamese Dong Data

    • tradingeconomics.com
    • fr.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). Vietnamese Dong Data [Dataset]. https://tradingeconomics.com/vietnam/currency
    Explore at:
    excel, json, xml, csvAvailable download formats
    Dataset updated
    Jun 15, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Aug 4, 1994 - Jul 31, 2025
    Area covered
    Vietnam
    Description

    The USD/VND exchange rate fell to 26,199.0000 on July 31, 2025, down 0.01% from the previous session. Over the past month, the Vietnamese Dong has weakened 0.26%, and is down by 3.90% over the last 12 months. Vietnamese Dong - values, historical data, forecasts and news - updated on July of 2025.

  20. AI-Powered Food Sustainability: Exploiting Knowledge Graphs for Reducing...

    • zenodo.org
    csv, xml
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anand Gavai; Anand Gavai (2025). AI-Powered Food Sustainability: Exploiting Knowledge Graphs for Reducing Carbon Footprints and Land Use [Dataset]. http://doi.org/10.5281/zenodo.14916809
    Explore at:
    csv, xmlAvailable download formats
    Dataset updated
    May 5, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anand Gavai; Anand Gavai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview


    This README documents the datasets and RDF graph used in the research article "AI-Powered Food Sustainability: Exploiting Knowledge Graphs for Reducing Carbon Footprints and Land Use" by Anand K. Gavai, Suniti Vadalkar, and Mahak Sharma. The study employs AI-driven knowledge graphs to analyze the environmental impacts of food items, focusing on protein sources, and to propose sustainability interventions aligned with the United Nations Sustainable Development Goals (SDGs). The datasets and RDF graph provided here support the construction and querying of the knowledge graph for sustainability analysis.


    All data and associated source code are publicly available in a Zenodo repository:
    DOI: 10.5281/zenodo.10143973


    Data Description


    The datasets consist of structured environmental data for various food items, integrated into a knowledge graph to assess sustainability metrics such as carbon footprint, land use, water use, scarcity-weighted water use, and eutrophication. The data primarily focuses on global averages from 2010, sourced from Poore & Nemecek (2018), with an emphasis on protein-rich foods (e.g., beef, cheese, legumes) and other dietary staples.


    Sources


    The data were sourced from:


    1. Poore & Nemecek (2018): A comprehensive study on the environmental impacts of food production, providing global metrics for greenhouse gas (GHG) emissions, land use, and freshwater withdrawals.
      • Citation: Poore, J., Nemecek, T., 2018. Reducing food’s environmental impacts through producers and consumers. Science 360, 987–992. DOI: 10.1126/science.aaq0216

    1. OurWorldInData.org: Supplies additional sustainability metrics, including scarcity-weighted water use and eutrophication, complementing the Poore & Nemecek dataset.

    File Formats and Contents


    The repository includes CSV files and an RDF graph in Turtle format:


    CSV Files


    Five CSV files provide environmental metrics for 38 food items, focusing on 2010 global averages:


    1. GHG Emissions (Two Files)
      • Filename: ghg_emissions_per_kg.csv (two identical versions provided)

      • Columns: Entity (food item), Year (2010), GHG emissions per kilogram (Poore & Nemecek, 2018) (kg CO₂-equivalent per kg)

      • Example: Beef (dairy herd), 2010, 33.30

    1. Freshwater Withdrawals
      • Filename: freshwater_withdrawals_per_kg.csv

      • Columns: Entity (food item), Year (2010), Freshwater withdrawals per kilogram (Poore & Nemecek, 2018) (liters per kg)

      • Example: Cheese, 2010, 5605.2

    1. Land Use
      • Filename: land_use_per_kg.csv

      • Columns: Entity (food item), Year (2010), Land use per kilogram (Poore & Nemecek, 2018) (m² per kg)

      • Example: Nuts, 2010, 12.96

    1. Comprehensive Metrics (Protein Sources)
      • Filename: protein_source_metrics.csv

      • Columns: Food (food item), CarbonFootprint (kg CO₂-eq per kg), LandUse (m² per kg), WaterUse (liters per kg), Scarcity_weighted water use (liters per kg), Eutrophication (g PO₄-eq per kg)

      • Example: Eggs, 4.67, 6.27, 578, 17983, 21.76

    RDF Graph


    • Filename: food_emissions_graph.ttl

    • Format: Turtle (TTL)

    • Description: A knowledge graph representing a subset of food items (e.g., Beef, Cheese, Eggs) with their environmental metrics as properties.


    • Structure:
      • Nodes: Food items (e.g., :Beef) typed as :FoodItem.

      • Properties:
        • :hasCarbonFootprint (kg CO₂-eq per kg)

        • :hasLandUse (m² per kg)

        • :hasWaterUse (liters per kg)

        • :hasScarcityWeightedWaterUse (liters per kg)

        • :hasEutrophication (g PO₄-eq per kg)

      • Example:
        turtle


        WrapCopy

        :Beef rdf:type :FoodItem ; :hasCarbonFootprint 33.30 ; :hasLandUse 43.24 ; :hasWaterUse 2714 ; :hasScarcityWeightedWaterUse 119805 ; :hasEutrophication 365.29 .

    Key Metrics


    The datasets cover the following environmental impact metrics:


    • Carbon Footprint: GHG emissions in kg CO₂-equivalent per kg of food.

    • Land Use: Area in square meters (m²) required per kg of food.

    • Water Use: Freshwater withdrawals in liters per kg of food.

    • Scarcity-Weighted Water Use: Water use adjusted for regional scarcity, in liters per kg (available for select protein sources).

    • Eutrophication: Nutrient pollution in grams of phosphate-equivalent (g PO₄-eq) per kg (available for select protein sources).

    Usage


    These datasets and the RDF graph were used to:


    1. Build an AI-driven knowledge graph for real-time sustainability analysis of food items.

    1. Enable SPARQL queries to rank food items by environmental impact (e.g., identifying low-carbon protein sources like nuts or peas).

    1. Compare traditional protein sources (e.g., beef, cheese) with alternatives (e.g., tofu, soy milk).

    1. Support policy recommendations, such as taxing high-impact foods or promoting sustainable alternatives.

    Example applications:


    • Querying :hasCarbonFootprint to identify that beef (99.48 kg CO₂-eq/kg) far exceeds nuts (0.43 kg CO₂-eq/kg).

    • Assessing trade-offs, e.g., cheese’s high water use (5605 liters/kg) vs. soy milk’s low water use (27.8 liters/kg).

    Access and Availability


    All datasets and the RDF graph are available in the Zenodo repository:




    The repository includes:


    • Raw CSV files (ghg_emissions_per_kg.csv, freshwater_withdrawals_per_kg.csv, land_use_per_kg.csv, protein_source_metrics.csv).

    • RDF graph file (food_emissions_graph.ttl).

    • Scripts for data integration and knowledge graph construction (see repository for details).

    Limitations


    • Data Scope: Focuses on 2010 global averages, lacking regional or temporal variations.

    • Completeness: Scarcity-weighted water use and eutrophication metrics are available only for a subset of protein sources.

    • Static Nature: Reflects a snapshot from Poore & Nemecek (2018), not real-time data.

    • RDF Coverage: The provided RDF graph includes only 8 food items; the full graph in the study may cover more.

    Funding


    This work was supported by the ‘High Tech for a Sustainable Future’ capacity building programme of the 4TU Federation in the Netherlands.


    Contact


    For questions or further information, please contact the corresponding author:


    • Name: Anand K. Gavai


    • Affiliation: Industrial Engineering & Business Information Systems, University of Twente, Enschede, The Netherlands

    Citation


    If you use this dataset or RDF graph, please cite the original manuscript:
    Gavai, A.K., Vadalkar, S., Sharma, M. (2025). AI-Powered Food Sustainability: Exploiting Knowledge Graphs for Reducing Carbon Footprints and Land Use.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
esri_en (2021). Chart Viewer [Dataset]. https://city-of-lawrenceville-arcgis-hub-lville.hub.arcgis.com/items/be4582b38d764de0a970b986c824acde

Chart Viewer

Explore at:
Dataset updated
Sep 22, 2021
Dataset authored and provided by
esri_en
Description

Use the Chart Viewer template to display bar charts, line charts, pie charts, histograms, and scatterplots to complement a map. Include multiple charts to view with a map or side by side with other charts for comparison. Up to three charts can be viewed side by side or stacked, but you can access and view all the charts that are authored in the map. Examples: Present a bar chart representing average property value by county for a given area. Compare charts based on multiple population statistics in your dataset. Display an interactive scatterplot based on two values in your dataset along with an essential set of map exploration tools. Data requirements The Chart Viewer template requires a map with at least one chart configured. Key app capabilities Multiple layout options - Choose Stack to display charts stacked with the map, or choose Side by side to display charts side by side with the map. Manage chart - Reorder, rename, or turn charts on and off in the app. Multiselect chart - Compare two charts in the panel at the same time. Bookmarks - Allow users to zoom and pan to a collection of preset extents that are saved in the map. Home, Zoom controls, Legend, Layer List, Search Supportability This web app is designed responsively to be used in browsers on desktops, mobile phones, and tablets. We are committed to ongoing efforts towards making our apps as accessible as possible. Please feel free to leave a comment on how we can improve the accessibility of our apps for those who use assistive technologies.

Search
Clear search
Close search
Google apps
Main menu