Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DBpedia graph embeddings using RDF2Vec. RDF2Vec embedding generation code can be found here and is based on a publication by Portisch et al. [1].
The embeddings dataset consists of 200-dimensional vectors of DBpedia entities (from 1/9/2021).
Figure of cosine similarities between a selected set of DBpedia entities are provided in the dataset here.
Generating Embeddings
The code for generating these embeddings can be found here.
Run the run.sh script that wraps all the necessary commmands to generate embeddings
bash run.sh
The script downloads a set of DBpedia files, which are listed in dbpedia_files.txt
. It then builds a Docker image and runs a container of that image that generates the embeddings for the DBpedia graph defined by the DBpedia files.
A folder files
is created containing all the downloaded DBpedia files, and a folder embeddings/dbpedia
is created containing the embeddings in vectors.txt
along a set of random walk files.
Run Time of Embeddings Generation
Generating embeddings can take more than a day, but it depends on the number of DBpedia files chosen to be downloaded. Following are some basic run time statistics when embeddings are generated on a 64 GB RAM, 8 cores (AMD EPYC), 1 TB SSD, 1996.221 MHz machine.
Parameters Used
Here is listed the parameters used to generate the embeddings provided here:
DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset provides embeddings for entities and relations in DBpedia (English) and Wikidata. The two knowledge graphs are first merged using a novel approach that we developed by leveraging the sameAs links between them. Then, we used the state-of-the-art embedding model ConEx to compute embeddings of the merge. Our embeddings are called universal knowledge graph embeddings.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A benchmark for Spatial Knowledge Graph Completion (SKGC) extracted from DBpedia.
It can be used to evaluate Spatial Knowledge Graph Embedding or Completion methods.
The S-DBpedia baseline dataset contains two types of datasets.
Data scale: S-DBpedia_small, S-DBpedia_medium, S-DBpedia_large, S-DBpedia.
Data Sparsity: S-DBpedia_GT5E, S-DBpedia_GT10E, S-DBpedia_GT20E, S-DBpedia_GT50E.
We extracted all attributes of entities in the dataset from DBpedia. It contains text, numerical, and image information. The data here includes the dataset file (dataset name) and all attribute files of the entity (Attribute.tar.gz).
For the construction code, evaluation and detailed usage instructions of the dataset, please see https://github.com/NEU-IDKE/S-DBpedia.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
The core version of DBpedia has too many entities and statements to train recommendation models in a reasonable time frame, which is why we created two subsets (DB1M, and DBA240) of the core version of DBpedia from September 2022.
File structure
Each dataset is located in their own folder with the following files:
A new benchmark dataset for simple question answering over knowledge graphs that was created by mapping SimpleQuestions entities and predicates from Freebase to DBpedia.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
DBPedia Classes
DBpedia is a knowledge graph extracted from Wikipedia, providing structured data about real-world entities and their relationships. DBpedia Classes are the core building blocks of this knowledge graph, representing different categories or types of entities.
Key Concepts:
Entity: A real-world object, such as a person, place, thing, or concept. Class: A group of entities that share common properties or characteristics. Instance: A specific member of a class.
Examples of DBPedia Classes:
Person: Represents individuals, e.g., "Barack Obama," "Albert Einstein." Place: Represents locations, e.g., "Paris," "Mount Everest." Organization: Represents groups, institutions, or companies, e.g., "Google," "United Nations." Event: Represents occurrences, e.g., "World Cup," "French Revolution." Artwork: Represents creative works, e.g., "Mona Lisa," "Star Wars."
Hierarchy and Relationships:
DBpedia classes often have a hierarchical structure, where subclasses inherit properties from their parent classes. For example, the class "Person" might have subclasses like "Politician," "Scientist," and "Artist."
Relationships between classes are also important. For instance, a "Person" might have a "birthPlace" relationship with a "Place," or an "Artist" might have a "hasArtwork" relationship with an "Artwork."
Applications of DBPedia Classes:
Semantic Search: DBPedia classes can be used to enhance search results by understanding the context and meaning of queries.
Knowledge Graph Construction: DBPedia classes form the foundation of knowledge graphs, which can be used for various applications like question answering, recommendation systems, and data integration.
Data Analysis: DBPedia classes can be used to analyze and extract insights from large datasets.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains the vectors from computing RDF2vec embeddings from a inverse predicate frequency weighted DBpedia 2016-04 graph.
For each entity in the graph, the text file in the zip archive contains a line with the entity name and the embedded vector.
The parameter settings for the embedding are as specified in the paper:
Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Biased graph walks for RDF graph embeddings. In Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (WIMS '17). ACM, New York, NY, USA, Article 21, 12 pages. DOI: https://doi.org/10.1145/3102254.3102279
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains the vectors from computing RDF2vec embeddings from a Page Rank weighted DBpedia 2016-04 graph.
For each entity in the graph, the text file in the zip archive contains a line with the entity name and the embedded vector.
The parameter settings for the embedding are as specified in the paper:
Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Biased graph walks for RDF graph embeddings. In Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (WIMS '17). ACM, New York, NY, USA, Article 21, 12 pages. DOI: https://doi.org/10.1145/3102254.3102279
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CaLiGraph is a large-scale semantic knowledge graph with a rich ontology which is compiled from the DBpedia ontology, and Wikipedia categories & list pages. For more information, visit http://caligraph.org
Information about uploaded files: (all files are b-zipped and in the n-triple format)
caligraph-metadata.nt.bz2 Metadata about the dataset which is described using void vocabulary.
caligraph-ontology.nt.bz2 Class definitions, property definitions, restrictions, and labels of the CaLiGraph ontology.
caligraph-ontology_dbpedia-mapping.nt.bz2 Mapping of classes and properties to the DBpedia ontology.
caligraph-ontology_provenance.nt.bz2 Provenance information about classes (i.e. which Wikipedia category or list page has been used to create this class).
caligraph-instances_types.nt.bz2 Definition of instances and (non-transitive) types.
caligraph-instances_transitive-types.nt.bz2 Transitive types for instances (can also be induced by a reasoner).
caligraph-instances_labels.nt.bz2 Labels for instances.
caligraph-instances_relations.nt.bz2 Relations between instances derived from the class restrictions of the ontology (can also be induced by a reasoner).
caligraph-instances_dbpedia-mapping.nt.bz2 Mapping of instances to respective DBpedia instances.
caligraph-instances_provenance.nt.bz2 Provenance information about instances (e.g. if the instance has been extracted from a Wikipedia list page).
dbpedia_caligraph-instances.nt.bz2 Additional instances of CaLiGraph that are not in DBpedia. ! This file is no part of CaLiGraph but should rather be used as an extension to DBpedia. The triples use the DBpedia namespace and can thus be used to directly extend DBpedia. !
dbpedia_caligraph-types.nt.bz2 Additional types of CaLiGraph that are not in DBpedia. ! This file is no part of CaLiGraph but should rather be used as an extension to DBpedia. The triples use the DBpedia namespace and can thus be used to directly extend DBpedia. !
dbpedia_caligraph-relations.nt.bz2 Additional relations of CaLiGraph that are not in DBpedia. ! This file is no part of CaLiGraph but should rather be used as an extension to DBpedia. The triples use the DBpedia namespace and can thus be used to directly extend DBpedia. !
Changelog
v3.1.1
Fixed an encoding issue in caligraph-ontology.nt.bz2
v3.1.0
Fixed several issues related to ontology consistency and structure
v3.0.0
Added functionality to group mentions of unknown entities into distinct entities
v2.1.0
Fixed error that lead to a class inheriting from a disjoint class
Introduced owl:ObjectProperty and owl:DataProperty instead of rdf:Property
Several cosmetic fixes
v2.0.2
Fixed incorrect formatting of some properties
v2.0.1
Better entity extraction and representation
Small cosmetic fixes
v2.0.0
Entity extraction from arbitrary tables and enumerations in Wikipedia pages
v1.4.0
BERT-based recognition of subject entities and improved language models from spaCy 3.0
v1.3.1
Fixed minor encoding errors and improved formatting
v1.3.0
CaLiGraph is now based on a recent version of Wikipedia and DBpedia from November 2020
v1.1.0
Improved the CaLiGraph type hierarchy
Many small bugfixes and improvements
v1.0.9
Additional alternative labels for CaLiGraph instances
v1.0.8
Small cosmetic changes to URIs to be closer to DBpedia URIs
v1.0.7
Mappings from CaLiGraph classes to DBpedia classes are now realised via rdfs:subClassOf instead of owl:equivalentClass
Entities are now URL-encoded to improve accessibility
v1.0.6
Fixed a bug in the ontology creation step that led to a substantially lower amount of sub-type relationships than actually exist. The new version provides a richer type hierarchy that also leads to an increased amount of types for resources.
v1.0.5
Fixed a bug that has declared CaLiGraph predicates as subclasses of owl:Predicate instead of being of the type owl:Predicate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets that can be used together with the Code in: https://github.com/JanKalo/RuleAlign
This dataset contains the vectors from computing RDF2vec embeddings from a inverse Page Rank frequency weighted DBpedia 2016-04 graph. For each entity in the graph, the text file in the zip archive contains a line with the entity name and the embedded vector. The parameter settings for the embedding are as specified in the paper: Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Biased graph walks for RDF graph embeddings. In Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (WIMS '17). ACM, New York, NY, USA, Article 21, 12 pages. DOI: https://doi.org/10.1145/3102254.3102279
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
QBLink-KG is a modified version of QBLink, which is a high-quality benchmark for evaluating conversational understanding of Wikipedia content.QBLink consists of sequences of up to three hand-crafted queries, with responses being single-named entities that match the titles of Wikipedia articles.For the QBLink-KG, the English subset of the DBpedia snapshot from September 2021 was used as the target Knowledge Graph. QBLink answers provided as the titles of Wikipedia infoboxes can be easily mapped to DBpedia entity URIs - if the corresponding entities are present in DBpedia - since DBpedia is constructed through the extraction of information from Wikipedia infoboxes.QBLink, in its original format, is not directly applicable for Conversational Entity Retrieval from a Knowledge Graph (CER-KG) because knowledge graphs contain considerably less information than Wikipedia. A named entity serving as an answer to a QBLink query may not be present as an entity in DBpedia. To modify QBLink for CER over DBpedia, we implemented two filtering steps: 1) we removed all queries for which the wiki_page field is empty, or the answer cannot be mapped to a DBpedia entity or does not match to a Wikipedia page. 2) For the evaluation of a model with specific techniques for entity linking and candidate selection, we excluded queries with answers that do not belong to the set of candidate entities derived using that model.The original QBLink dataset files before filtering are:QBLink-train.jsonQBLink-dev.jsonQBLink-test.jsonAnd the final QBLink-KG files after filtering are:QBLink-Filtered-train.jsonQBLink-Filtered-dev.jsonQBLink-Filtered-test.jsonWe used below references to construct QBLink-KG:Ahmed Elgohary, Chen Zhao, and Jordan Boyd-Graber. 2018. A dataset and baselines for sequential open-domain question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1077–1083, Brussels, Belgium. Association for Computational Linguistics.https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2021-09Lehmann, Jens et al. ‘DBpedia – A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia’. 1 Jan. 2015 : 167 – 195.To give more details about QBLink-KG, please read our research paper:Zamiri, Mona, et al. "Benchmark and Neural Architecture for Conversational Entity Retrieval from a Knowledge Graph", The Web Conference 2024.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains the vectors from computing RDF2vec embeddings from a inverse page rank split weighted DBpedia 2016-04 graph.
For each entity in the graph, the text file in the zip archive contains a line with the entity name and the embedded vector.
The parameter settings for the embedding are as specified in the paper:
Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Biased graph walks for RDF graph embeddings. In Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (WIMS '17). ACM, New York, NY, USA, Article 21, 12 pages. DOI: https://doi.org/10.1145/3102254.3102279
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository includes all code and data for causal inference over the knowledge graph, it includes experiments over four datasets: the synthetic review dataset, the open review dataset, the subset of DBpedia related to the writer, the MIMIC-III (we don't offer the data due to confidential issue).
This dataset contains the vectors from computing KGloVe embeddings from a Page Rank split frequency weighted DBpedia 2016-04 graph. For each entity in the graph, the text file in the zip archive contains a line with the entity name and the embedded vector. The parameter settings for the embedding are as specified in the paper: Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Global RDF Vector Space Embeddings. In The Semantic Web – ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21–25, 2017, Proceedings, Part I.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
QALD-9-Plus is the dataset for Knowledge Graph Question Answering (KGQA) based on well-known QALD-9.QALD-9-Plus enables to train and test KGQA systems over DBpedia and Wikidata using questions in 8 different languages.Some of the questions have several alternative writings in particular languages which enables to evaluate the robustness of KGQA systems and train paraphrasing models.As the questions' translations were provided by native speakers, they are considered as "gold standard", therefore, machine translation tools can be trained and evaluated on the dataset.Please, see also the GitHub repository: https://github.com/Perevalov/qald_9_plus
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains the vectors from computing KGloVe embeddings from a uniformly weighted DBpedia 2016-04 graph.
For each entity in the graph, the text file in the zip archive contains a line with the entity name and the embedded vector.
The parameter settings for the embedding are as specified in the paper:
Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Global RDF Vector Space Embeddings. In The Semantic Web – ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21–25, 2017, Proceedings, Part I.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains the vectors from computing RDF2vec embeddings from a uniformly weighted DBpedia 2016-04 graph.
For each entity in the graph, the text file in the zip archive contains a line with the entity name and the embedded vector.
The parameter settings for the embedding are as specified in the paper:
Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Biased graph walks for RDF graph embeddings. In Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (WIMS '17). ACM, New York, NY, USA, Article 21, 12 pages. DOI: https://doi.org/10.1145/3102254.3102279
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Large knowledge graphs like DBpedia and YAGO are always based on the same source - namely Wikipedia. But there are more wikis that contain information about long-tail entities such as wiki hosting platforms like Fandom. In this paper, we present the approach and analysis of DBkWik++, a fused Knowledge Graph from thousands of wikis. A modified version of the DBpedia framework is applied to each wiki which results in many isolated Knowledge Graphs. With an incremental merge based approach, we reuse one-to-one matching systems to solve the multi source KG matching task. Based on this alignment we create a consolidated knowledge graph with more than 15 million instances.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DBpedia graph embeddings using RDF2Vec. RDF2Vec embedding generation code can be found here and is based on a publication by Portisch et al. [1].
The embeddings dataset consists of 200-dimensional vectors of DBpedia entities (from 1/9/2021).
Figure of cosine similarities between a selected set of DBpedia entities are provided in the dataset here.
Generating Embeddings
The code for generating these embeddings can be found here.
Run the run.sh script that wraps all the necessary commmands to generate embeddings
bash run.sh
The script downloads a set of DBpedia files, which are listed in dbpedia_files.txt
. It then builds a Docker image and runs a container of that image that generates the embeddings for the DBpedia graph defined by the DBpedia files.
A folder files
is created containing all the downloaded DBpedia files, and a folder embeddings/dbpedia
is created containing the embeddings in vectors.txt
along a set of random walk files.
Run Time of Embeddings Generation
Generating embeddings can take more than a day, but it depends on the number of DBpedia files chosen to be downloaded. Following are some basic run time statistics when embeddings are generated on a 64 GB RAM, 8 cores (AMD EPYC), 1 TB SSD, 1996.221 MHz machine.
Parameters Used
Here is listed the parameters used to generate the embeddings provided here: