As post hoc explanations are increasingly used to understand the behavior of Graph Neural Networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations. However, assessing the quality of GNN explanations is challenging as existing graph datasets have no or unreliable ground-truth explanations for a given task. Here, we introduce a synthetic graph data generator, ShapeGGen, which can generate a variety of benchmark datasets (e.g., varying graph sizes, degree distributions, homophilic vs. heterophilic graphs) accompanied by ground-truth explanations. Further, the flexibility to generate diverse synthetic datasets and corresponding ground-truth explanations allows us to mimic the data generated by various real-world applications. We include ShapeGGen and additional XAI-ready real-world graph datasets into an open-source graph explainability library, GraphXAI. In addition, GraphXAI provides a broader ecosystem of data loaders, data processing functions, synthetic and real-world graph datasets with ground-truth explanations, visualizers, GNN model implementations, and a set of evaluation metrics to benchmark the performance of any given GNN explainer.
https://www.gnu.org/licenses/lgpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/lgpl-3.0-standalone.html
Data sets and json files (describing the semantic header and dataset description) to build an Event Knowledge Graph (EKG) using OCED-PG as used in [1].
Provides input data for 6 datasets (BPIC14, BPIC15, BPIC16, BPIC17, BPIC19 and a simulated libraray example).
EKGs are built using OCED-PG, implemented in PromgG v0.1.25. The source code can be found at Github.
To build EKGs using OCED-PG
[1] Swevels, A., Fahland, D., Montali, M.: Implementing Object-Centric Event Data Models in Event Knowledge Graphs (2023)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The OpenAIRE Graph is exported as several dumps, so you can download the parts you are interested into.
publication_[part].tar: metadata records about research literature (includes types of publications listed here)
dataset_[part].tar: metadata records about research data (includes the subtypes listed here)
software.tar: metadata records about research software (includes the subtypes listed here)
otherresearchproduct_[part].tar: metadata records about research products that cannot be classified as research literature, data or software (includes types of products listed here)
organization.tar: metadata records about organizations involved in the research life-cycle, such as universities, research organizations, funders.
datasource.tar: metadata records about data sources whose content is available in the OpenAIRE Graph. They include institutional and thematic repositories, journals, aggregators, funders' databases.
project.tar: metadata records about project grants.
relation_[part].tar: metadata records about relations between entities in the graph.
communities_infrastructures.tar: metadata records about research communities and research infrastructures
Each file is a tar archive containing gz files, each with one json per line. Each json is compliant to the schema available at http://doi.org/10.5281/zenodo.7492151. The documentation for the model is available at https://graph.openaire.eu/docs/data-model/
Learn more about the OpenAIRE Graph at https://graph.openaire.eu.
Discover the graph's content on OpenAIRE EXPLORE and our API for developers.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Current approaches to identifying drug-drug interactions (DDIs), which involve clinical evaluation of drugs and post-marketing surveillance, are unable to provide complete, accurate information, nor do they alert the public to potentially dangerous DDIs before the drugs reach the market. Predicting potential drug-drug interaction helps reduce unanticipated drug interactions and drug development costs and optimizes the drug design process. Many bioinformatics databases have begun to present their data as Linked Open Data (LOD), a graph data model, using Semantic Web technologies. The knowledge graphs provide a powerful model for defining the data, in addition to making it possible to use underlying graph structure for extraction of meaningful information. In this work, we have applied Knowledge Graph (KG) Embedding approaches to extract feature vector representation of drugs using LOD to predict potential drug-drug interactions. We have investigated the effect of different embedding methods on the DDI prediction and showed that the knowledge embeddings are powerful predictors and comparable to current state-of-the-art methods for inferring new DDIs. We have applied Logistic Regression, Naive Bayes and Random Forest on Drugbank KG with the 10-fold traditional cross validation (CV) using RDF2Vec, TransE and TransD. RDF2Vec with uniform weighting surpass other embedding methods.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The OpenAIRE Graph is an Open Access dataset containing metadata about research products (literature, datasets, software, etc.) linked to other entities of the research ecosystem like organisations, project grants, and data sources.
The large size of the OpenAIRE Graph is a major impediment for beginners to familiarise with the underlying data model and explore its contents. Working with the Graph in its full size typically requires access to a huge distributed computing infrastructure which cannot be easily accessible to everyone.
The OpenAIRE Beginner’s Kit aims to address this issue. It consists of two components:
A Zeppelin notebook that demonstrates how you can use PySpark to analyse the Graph and get answers to some interesting research questions: beginners_kit_zeppelin_notebook.json. Here a guide to Apache Zeppelin.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Market Analysis The Open Research Knowledge Graph (ORKG) market is poised for significant growth, driven by the increasing adoption of data governance, data analytics, and knowledge management solutions across various industries. The global ORKG market was valued at USD 687 million in 2023 and is projected to reach USD 1,363 million by 2033, exhibiting a CAGR of 7.1%. This growth is attributed to the rising need for organizations to manage, analyze, and share complex research data and knowledge efficiently. Key Trends and Drivers The ORKG market is driven by several key trends, including the increasing adoption of cloud computing, the advancement of artificial intelligence (AI) and machine learning (ML), and the growing emphasis on data privacy and security. Additionally, the rise of open source software and the increasing availability of structured and unstructured research data are contributing to market growth. Key players in the ORKG market include IBM, Oracle, Microsoft, AWS, and Neo4j. These companies are investing heavily in research and development to offer innovative solutions that meet the evolving needs of organizations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The OpenAIRE Graph is exported as several dataseta, so you can download the parts you are interested into.
publication_[part].tar: metadata records about research literature (includes types of publications listed here)
dataset_[part].tar: metadata records about research data (includes the subtypes listed here)
software.tar: metadata records about research software (includes the subtypes listed here)
otherresearchproduct_[part].tar: metadata records about research products that cannot be classified as research literature, data or software (includes types of products listed here)
organization.tar: metadata records about organizations involved in the research life-cycle, such as universities, research organizations, funders.
datasource.tar: metadata records about data sources whose content is available in the OpenAIRE Graph. They include institutional and thematic repositories, journals, aggregators, funders' databases.
project.tar: metadata records about project grants.
relation_[part].tar: metadata records about relations between entities in the graph.
communities_infrastructures.tar: metadata records about research communities and research infrastructures
Each file is a tar archive containing gz files, each with one json per line. Each json is compliant to the schema available at http://doi.org/10.5281/zenodo.8238874. The documentation for the model is available at https://graph.openaire.eu/docs/data-model/
Learn more about the OpenAIRE Graph at https://graph.openaire.eu.
Discover the graph's content on OpenAIRE EXPLORE and our API for developers.
https://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy
The size and share of the market is categorized based on Application (Web Development, Enterprise Applications, Big Data Analytics, IoT, Mobile Apps) and Product (Relational Database Management Systems (RDBMS), NoSQL Databases, NewSQL Databases, Graph Databases, Time-Series Databases) and geographical regions (North America, Europe, Asia-Pacific, South America, and Middle-East and Africa).
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Open Event Knowledge Graph (OEKG) is novel event-centric knowledge graph that is the focal point of integration of the different ESR projects conducted during the CLEOPATRA ITN project. The OEKG makes the extracted information available to the community and makes it accessible for a wide variety of applications and application domains within and beyond the Cleopatra ITN.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 2. Comparison results for COD entries.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Three temporal graph datasets for node classification under distribution shift.
DBLP-Easy and DBLP-Hard are citation graph datasets. PharmaBio is a collaboration graph dataset.
Vertices are scientific publications, edges are either citations (DBLP) or at-least-one-common-author relationships (PharmaBio).
The task is to classify the vertices of the graph into the respective conference/journal venues (DBLP) or journal categories (PharmaBio). In the DBLP datasets, new classes may appear over time.
Each dataset follows the structure:
- adjlist.txt -- the graph structure encoded as adjacency lists: in each row, the first entry is the source vertex, the remaining entries are adjacent vertices
- X.npy -- numpy serialized format for node features indexed by node id corresponding to adjlist.txt
- y.npy -- numpy serialized format for node labels indexed by node id corresponding to adjlist.txt
- t.npy -- numpy serialized format for time steps indexed by node id corresponding to adjlist.txt
A paper describing our incremental training and evaluation framework is published in IJCNN 2021 (Pre-print on arXiv: https://arxiv.org/abs/2006.14422).
If you use these datasets in your research, please cite the corresponding paper:
@inproceedings{galke2021lifelong,
author={Galke, Lukas and Franke, Benedikt and Zielke, Tobias and Scherp, Ansgar},
booktitle={2021 International Joint Conference on Neural Networks (IJCNN)},
title={Lifelong Learning of Graph Neural Networks for Open-World Node Classification},
year={2021},
volume={},
number={},
pages={1-8},
doi={10.1109/IJCNN52387.2021.9533412}
}
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This evaluation set has been created for evaluating a content-based recommender system in the context of the Open Research Knowledge Graph (ORKG). The recommender system accepts structured ORKG contribution as input and recommends existing contributions in the ORKG semantically relevant to the given one.
The evaluation set is manually annotated based on the featured comparisons in the ORKG. In the course of this, it has been distinguished between homogeneous (those who are dissimilar in 2-3 properties) and heterogeneous (otherwise) instances. Multiple annotations have been obtained for the former and exactly one for the latter.
It has been also distinguished between "with_response" and "without_response" instances (50 instances for each). The former are those contributions for them the initial version of the contributions similarity service has found similarities and the latter are the opposite case.
This evaluation set has been created and applied on a modified version of the contributions similarity service in the context of this master's thesis. The modified version of the service has simplified the document representation of contributions that are stored in an ElasticSearch index by omitting redundant terms.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The SURE-KG RDF dataset provides a knowledge graph built from a real dataset to represent Real Estate and Uncertain Spatial Data from Advertisements. It relies on natural language processing and machine learning methods for information extraction, and semantic Web frameworks for representation and integration. It describes more than 100K real estate ads and 6K place-names extracted from French Real Estate advertisements from various online advertiser and located in the French Riviera. It can be exploited by real estate search engines, real estate professionals, or geographers willing to analyze local place-names
Homepage: https://github.com/Wimmics/sure
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 3. ChEMBL version 13 entries with identical SMILES.
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for All Sectors; Open Market Paper; Liability, Level (BOGZ1FL893169175A) from 1945 to 2024 about open market paper, liabilities, sector, and USA.
(Link to Metadata) The WaterHydro_DLGSW layer represents surface waters (hydrography) at a scale of RF 100000. WaterHydro_DLGSW was derived from RF100000 USGS Digital Line Graph (DLG). DLG's of map features are converted to digital form from maps and related sources. Refer to the USGS web site from more information on DLGs (http://www.usgs.gov)
As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of 1244.08; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
Nova Scotia's government has an abundance of resources in terms of data and information. All this data has been collected and stored on the NSOD portal (https://data.novascotia.ca) in the form of datasets. The Nova Scotia Open Data Portal was built and managed by Socrata API (https://dev.socrata.com). We transformed the disease-related datasets of Nova Scotia Open Data into RDF, enriched them by disease ontology, and it is available to be used under the MIT licence.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset is part of the bachelor thesis "Evaluating SQuAD-based Question Answering for the Open Research Knowledge Graph Completion". It was created for the finetuning of Bert Based models pre-trained on the SQUaD dataset. The Dataset was created using semi-automatic approach on the ORKG data.
The dataset.csv file contains the entire data (all properties) in a tabular for and is unsplit. The json files contain only the necessary fields for training and evaluation, with additional fields (index of start and end of the answers in the abstracts). The data in the json files is split (training data) and evaluation data. We create 4 variants of the training and evaluation sets for each one of the question labels ("no label", "how", "what", "which")
For detailed information on each of the fields in the dataset, refer to section 4.2 (Corpus) of the Thesis document that can be found in https://www.repo.uni-hannover.de/handle/123456789/12958.
The script used to generate the dataset can be found in the public repository https://github.com/as18cia/thesis_work and https://gitlab.com/TIBHannover/orkg/nlp/experiments/orkg-fine-tuning-squad-based-models
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 149 zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than 394 zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just two percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of 19.2 percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached 6.7 zettabytes.
As post hoc explanations are increasingly used to understand the behavior of Graph Neural Networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations. However, assessing the quality of GNN explanations is challenging as existing graph datasets have no or unreliable ground-truth explanations for a given task. Here, we introduce a synthetic graph data generator, ShapeGGen, which can generate a variety of benchmark datasets (e.g., varying graph sizes, degree distributions, homophilic vs. heterophilic graphs) accompanied by ground-truth explanations. Further, the flexibility to generate diverse synthetic datasets and corresponding ground-truth explanations allows us to mimic the data generated by various real-world applications. We include ShapeGGen and additional XAI-ready real-world graph datasets into an open-source graph explainability library, GraphXAI. In addition, GraphXAI provides a broader ecosystem of data loaders, data processing functions, synthetic and real-world graph datasets with ground-truth explanations, visualizers, GNN model implementations, and a set of evaluation metrics to benchmark the performance of any given GNN explainer.