59 datasets found
  1. DBpedia RDF2Vec Graph Embeddings

    • zenodo.org
    pdf, zip
    Updated Jul 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Pekár Christensen; Martin Pekár Christensen; Matteo Lissandrini; Matteo Lissandrini; Katja Hose; Katja Hose (2024). DBpedia RDF2Vec Graph Embeddings [Dataset]. http://doi.org/10.5281/zenodo.6384728
    Explore at:
    pdf, zipAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Martin Pekár Christensen; Martin Pekár Christensen; Matteo Lissandrini; Matteo Lissandrini; Katja Hose; Katja Hose
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DBpedia graph embeddings using RDF2Vec. RDF2Vec embedding generation code can be found here and is based on a publication by Portisch et al. [1].

    The embeddings dataset consists of 200-dimensional vectors of DBpedia entities (from 1/9/2021).

    Figure of cosine similarities between a selected set of DBpedia entities are provided in the dataset here.

    Generating Embeddings

    The code for generating these embeddings can be found here.

    Run the run.sh script that wraps all the necessary commmands to generate embeddings

    bash run.sh

    The script downloads a set of DBpedia files, which are listed in dbpedia_files.txt. It then builds a Docker image and runs a container of that image that generates the embeddings for the DBpedia graph defined by the DBpedia files.

    A folder files is created containing all the downloaded DBpedia files, and a folder embeddings/dbpedia is created containing the embeddings in vectors.txt along a set of random walk files.

    Run Time of Embeddings Generation

    Generating embeddings can take more than a day, but it depends on the number of DBpedia files chosen to be downloaded. Following are some basic run time statistics when embeddings are generated on a 64 GB RAM, 8 cores (AMD EPYC), 1 TB SSD, 1996.221 MHz machine.

    • Total: 1 day, 8 hours, 52 minutes, 41 seconds
    • Walk generation: 0 days, 7 minutes, 24 minutes, 36 seconds
    • Training: 1 day, 1 hour, 28 minutes, 5 seconds

    Parameters Used

    Here is listed the parameters used to generate the embeddings provided here:

    • Number of walks per entity: 100
    • Depth (hops) per walk: 4
    • Walk generation mode: RANDOM_WALKS_DUPLICATE_FREE
    • Threads: # of processors / 2
    • Training mode: sg
    • Embeddings vector dimension: 200
    • Minimum word2vec word count: 1
    • Sample rate: 0.0
    • Training window size: 5
    • Training epochs: 5
  2. P

    DBpedia Dataset

    • paperswithcode.com
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sören Auer; Christian Bizer; Georgi Kobilarov; Jens Lehmann; Richard Cyganiak; Zachary G. Ives, DBpedia Dataset [Dataset]. https://paperswithcode.com/dataset/dbpedia
    Explore at:
    Authors
    Sören Auer; Christian Bizer; Georgi Kobilarov; Jens Lehmann; Richard Cyganiak; Zachary G. Ives
    Description

    DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.

  3. Data from: Universal Knowledge Graph Embeddings

    • zenodo.org
    bin, zip
    Updated Feb 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    N'Dah Jean Kouagou; Caglar Demir; M. Hamada Zahera; Stefan Heindorf; Jiayi Li; Axel-Cyrille Ngonga Ngomo; N'Dah Jean Kouagou; Caglar Demir; M. Hamada Zahera; Stefan Heindorf; Jiayi Li; Axel-Cyrille Ngonga Ngomo (2023). Universal Knowledge Graph Embeddings [Dataset]. http://doi.org/10.5281/zenodo.7503097
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Feb 2, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    N'Dah Jean Kouagou; Caglar Demir; M. Hamada Zahera; Stefan Heindorf; Jiayi Li; Axel-Cyrille Ngonga Ngomo; N'Dah Jean Kouagou; Caglar Demir; M. Hamada Zahera; Stefan Heindorf; Jiayi Li; Axel-Cyrille Ngonga Ngomo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset provides embeddings for entities and relations in DBpedia (English) and Wikidata. The two knowledge graphs are first merged using a novel approach that we developed by leveraging the sameAs links between them. Then, we used the state-of-the-art embedding model ConEx to compute embeddings of the merge. Our embeddings are called universal knowledge graph embeddings.

  4. Z

    S-DBpedia: A benchmark dataset and evaluation for spatial knowledge graph...

    • data.niaid.nih.gov
    Updated Jan 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li, Qinghui (2024). S-DBpedia: A benchmark dataset and evaluation for spatial knowledge graph completion [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7431612
    Explore at:
    Dataset updated
    Jan 27, 2024
    Dataset provided by
    Zhang, Fu
    Cheng, Jingwei
    Chen, Ming
    Mao, Chaoyuan
    Li, Qinghui
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A benchmark for Spatial Knowledge Graph Completion (SKGC) extracted from DBpedia.

    It can be used to evaluate Spatial Knowledge Graph Embedding or Completion methods.

    The S-DBpedia baseline dataset contains two types of datasets.

    Data scale: S-DBpedia_small, S-DBpedia_medium, S-DBpedia_large, S-DBpedia.

    Data Sparsity: S-DBpedia_GT5E, S-DBpedia_GT10E, S-DBpedia_GT20E, S-DBpedia_GT50E.

    We extracted all attributes of entities in the dataset from DBpedia. It contains text, numerical, and image information. The data here includes the dataset file (dataset name) and all attribute files of the entity (Attribute.tar.gz).

    For the construction code, evaluation and detailed usage instructions of the dataset, please see https://github.com/NEU-IDKE/S-DBpedia.

  5. CollabRec: DBpedia Subgraphs (2022-09)

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Mar 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2023). CollabRec: DBpedia Subgraphs (2022-09) [Dataset]. http://doi.org/10.5281/zenodo.7772596
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 28, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    The core version of DBpedia has too many entities and statements to train recommendation models in a reasonable time frame, which is why we created two subsets (DB1M, and DBA240) of the core version of DBpedia from September 2022.

    File structure
    Each dataset is located in their own folder with the following files:

    • index.tsv.gz is a file in tabular format that maps a simple integer to a URI, which identifies an entity in the KG.
    • index_labels.tsv.gz is a file that links entities (represented by their index number) to their label and description.
    • relevant_entities.tsv.gz is a file with all the entities, which occur as subject or/and as object in statements of the subsampled KG.
    • statements.tsv.gz is a file with all the statements of the subsampled KG. The first column contains the subjects, second column the predicates, and the third column the objects. All those entities are represented by their index number (see index.tsv.gz) and not their URI.
    • statements.nt.gz is a file with all the statements of the subsampled KG in N-Triples format.
  6. P

    SimpleDBpediaQA Dataset

    • paperswithcode.com
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Azmy; Peng Shi; Jimmy Lin; Ihab Ilyas, SimpleDBpediaQA Dataset [Dataset]. https://paperswithcode.com/dataset/simpledbpediaqa
    Explore at:
    Authors
    Michael Azmy; Peng Shi; Jimmy Lin; Ihab Ilyas
    Description

    A new benchmark dataset for simple question answering over knowledge graphs that was created by mapping SimpleQuestions entities and predicates from Freebase to DBpedia.

  7. Classes Knowledge Graph

    • kaggle.com
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Afroz (2024). Classes Knowledge Graph [Dataset]. https://www.kaggle.com/datasets/pythonafroz/dbpedia-classes-knowledge-graph/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Afroz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    DBPedia Classes

    DBpedia is a knowledge graph extracted from Wikipedia, providing structured data about real-world entities and their relationships. DBpedia Classes are the core building blocks of this knowledge graph, representing different categories or types of entities.

    Key Concepts:

    Entity: A real-world object, such as a person, place, thing, or concept. Class: A group of entities that share common properties or characteristics. Instance: A specific member of a class.

    Examples of DBPedia Classes:

    Person: Represents individuals, e.g., "Barack Obama," "Albert Einstein." Place: Represents locations, e.g., "Paris," "Mount Everest." Organization: Represents groups, institutions, or companies, e.g., "Google," "United Nations." Event: Represents occurrences, e.g., "World Cup," "French Revolution." Artwork: Represents creative works, e.g., "Mona Lisa," "Star Wars."

    Hierarchy and Relationships:

    DBpedia classes often have a hierarchical structure, where subclasses inherit properties from their parent classes. For example, the class "Person" might have subclasses like "Politician," "Scientist," and "Artist."

    Relationships between classes are also important. For instance, a "Person" might have a "birthPlace" relationship with a "Place," or an "Artist" might have a "hasArtwork" relationship with an "Artwork."

    Applications of DBPedia Classes:

    Semantic Search: DBPedia classes can be used to enhance search results by understanding the context and meaning of queries.

    Knowledge Graph Construction: DBPedia classes form the foundation of knowledge graphs, which can be used for various applications like question answering, recommendation systems, and data integration.

    Data Analysis: DBPedia classes can be used to analyze and extract insights from large datasets.

  8. RDF2Vec DBpedia inverse predicate frequency embeddings

    • zenodo.org
    • explore.openaire.eu
    zip
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Cochez; Michael Cochez; Petar Ristoski; Simone Paulo Ponzetto; Heiko Paulheim; Petar Ristoski; Simone Paulo Ponzetto; Heiko Paulheim (2020). RDF2Vec DBpedia inverse predicate frequency embeddings [Dataset]. http://doi.org/10.5281/zenodo.1320007
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michael Cochez; Michael Cochez; Petar Ristoski; Simone Paulo Ponzetto; Heiko Paulheim; Petar Ristoski; Simone Paulo Ponzetto; Heiko Paulheim
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains the vectors from computing RDF2vec embeddings from a inverse predicate frequency weighted DBpedia 2016-04 graph.

    For each entity in the graph, the text file in the zip archive contains a line with the entity name and the embedded vector.

    The parameter settings for the embedding are as specified in the paper:

    Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Biased graph walks for RDF graph embeddings. In Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (WIMS '17). ACM, New York, NY, USA, Article 21, 12 pages. DOI: https://doi.org/10.1145/3102254.3102279

  9. Z

    RDF2Vec DBpedia Page Rank embeddings

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ponzetto, Simone Paulo (2020). RDF2Vec DBpedia Page Rank embeddings [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1320037
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Ponzetto, Simone Paulo
    Ristoski, Petar
    Paulheim, Heiko
    Cochez, Michael
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains the vectors from computing RDF2vec embeddings from a Page Rank weighted DBpedia 2016-04 graph.

    For each entity in the graph, the text file in the zip archive contains a line with the entity name and the embedded vector.

    The parameter settings for the embedding are as specified in the paper:

    Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Biased graph walks for RDF graph embeddings. In Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (WIMS '17). ACM, New York, NY, USA, Article 21, 12 pages. DOI: https://doi.org/10.1145/3102254.3102279

  10. Z

    CaLiGraph - A Large-Scale Semantic Knowledge Graph compiled from Wikipedia...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paulheim, Heiko (2023). CaLiGraph - A Large-Scale Semantic Knowledge Graph compiled from Wikipedia Categories and List Pages [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3484511
    Explore at:
    Dataset updated
    Jun 25, 2023
    Dataset provided by
    Heist, Nicolas
    Paulheim, Heiko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CaLiGraph is a large-scale semantic knowledge graph with a rich ontology which is compiled from the DBpedia ontology, and Wikipedia categories & list pages. For more information, visit http://caligraph.org

    Information about uploaded files: (all files are b-zipped and in the n-triple format)

    caligraph-metadata.nt.bz2 Metadata about the dataset which is described using void vocabulary.

    caligraph-ontology.nt.bz2 Class definitions, property definitions, restrictions, and labels of the CaLiGraph ontology.

    caligraph-ontology_dbpedia-mapping.nt.bz2 Mapping of classes and properties to the DBpedia ontology.

    caligraph-ontology_provenance.nt.bz2 Provenance information about classes (i.e. which Wikipedia category or list page has been used to create this class).

    caligraph-instances_types.nt.bz2 Definition of instances and (non-transitive) types.

    caligraph-instances_transitive-types.nt.bz2 Transitive types for instances (can also be induced by a reasoner).

    caligraph-instances_labels.nt.bz2 Labels for instances.

    caligraph-instances_relations.nt.bz2 Relations between instances derived from the class restrictions of the ontology (can also be induced by a reasoner).

    caligraph-instances_dbpedia-mapping.nt.bz2 Mapping of instances to respective DBpedia instances.

    caligraph-instances_provenance.nt.bz2 Provenance information about instances (e.g. if the instance has been extracted from a Wikipedia list page).

    dbpedia_caligraph-instances.nt.bz2 Additional instances of CaLiGraph that are not in DBpedia. ! This file is no part of CaLiGraph but should rather be used as an extension to DBpedia. The triples use the DBpedia namespace and can thus be used to directly extend DBpedia. !

    dbpedia_caligraph-types.nt.bz2 Additional types of CaLiGraph that are not in DBpedia. ! This file is no part of CaLiGraph but should rather be used as an extension to DBpedia. The triples use the DBpedia namespace and can thus be used to directly extend DBpedia. !

    dbpedia_caligraph-relations.nt.bz2 Additional relations of CaLiGraph that are not in DBpedia. ! This file is no part of CaLiGraph but should rather be used as an extension to DBpedia. The triples use the DBpedia namespace and can thus be used to directly extend DBpedia. !

    Changelog

    v3.1.1

    Fixed an encoding issue in caligraph-ontology.nt.bz2

    v3.1.0

    Fixed several issues related to ontology consistency and structure

    v3.0.0

    Added functionality to group mentions of unknown entities into distinct entities

    v2.1.0

    Fixed error that lead to a class inheriting from a disjoint class

    Introduced owl:ObjectProperty and owl:DataProperty instead of rdf:Property

    Several cosmetic fixes

    v2.0.2

    Fixed incorrect formatting of some properties

    v2.0.1

    Better entity extraction and representation

    Small cosmetic fixes

    v2.0.0

    Entity extraction from arbitrary tables and enumerations in Wikipedia pages

    v1.4.0

    BERT-based recognition of subject entities and improved language models from spaCy 3.0

    v1.3.1

    Fixed minor encoding errors and improved formatting

    v1.3.0

    CaLiGraph is now based on a recent version of Wikipedia and DBpedia from November 2020

    v1.1.0

    Improved the CaLiGraph type hierarchy

    Many small bugfixes and improvements

    v1.0.9

    Additional alternative labels for CaLiGraph instances

    v1.0.8

    Small cosmetic changes to URIs to be closer to DBpedia URIs

    v1.0.7

    Mappings from CaLiGraph classes to DBpedia classes are now realised via rdfs:subClassOf instead of owl:equivalentClass

    Entities are now URL-encoded to improve accessibility

    v1.0.6

    Fixed a bug in the ontology creation step that led to a substantially lower amount of sub-type relationships than actually exist. The new version provides a richer type hierarchy that also leads to an increased amount of types for resources.

    v1.0.5

    Fixed a bug that has declared CaLiGraph predicates as subclasses of owl:Predicate instead of being of the type owl:Predicate.

  11. Detecting Synonymous Relationships by Shared Data-driven Definitions

    • figshare.com
    txt
    Updated Dec 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan-Christoph Kalo (2019). Detecting Synonymous Relationships by Shared Data-driven Definitions [Dataset]. http://doi.org/10.6084/m9.figshare.11343785.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 9, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Jan-Christoph Kalo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets that can be used together with the Code in: https://github.com/JanKalo/RuleAlign

  12. o

    Rdf2Vec Dbpedia Inverse Page Rank Frequency Embeddings

    • explore.openaire.eu
    • data.niaid.nih.gov
    • +1more
    Updated Jun 19, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Cochez; Petar Ristoski; Simone Paulo Ponzetto; Heiko Paulheim (2017). Rdf2Vec Dbpedia Inverse Page Rank Frequency Embeddings [Dataset]. http://doi.org/10.5281/zenodo.1320810
    Explore at:
    Dataset updated
    Jun 19, 2017
    Authors
    Michael Cochez; Petar Ristoski; Simone Paulo Ponzetto; Heiko Paulheim
    Description

    This dataset contains the vectors from computing RDF2vec embeddings from a inverse Page Rank frequency weighted DBpedia 2016-04 graph. For each entity in the graph, the text file in the zip archive contains a line with the entity name and the embedded vector. The parameter settings for the embedding are as specified in the paper: Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Biased graph walks for RDF graph embeddings. In Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (WIMS '17). ACM, New York, NY, USA, Article 21, 12 pages. DOI: https://doi.org/10.1145/3102254.3102279

  13. QBLink-KG: QBLink Adapted to DBpedia Knowledge Graph

    • figshare.com
    json
    Updated Feb 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mona Zamiri; Yao Qiang; Fedor Nikolaev; Dongxiao Zhu; Alexander Kotov (2024). QBLink-KG: QBLink Adapted to DBpedia Knowledge Graph [Dataset]. http://doi.org/10.6084/m9.figshare.25256290.v3
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Feb 21, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mona Zamiri; Yao Qiang; Fedor Nikolaev; Dongxiao Zhu; Alexander Kotov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    QBLink-KG is a modified version of QBLink, which is a high-quality benchmark for evaluating conversational understanding of Wikipedia content.QBLink consists of sequences of up to three hand-crafted queries, with responses being single-named entities that match the titles of Wikipedia articles.For the QBLink-KG, the English subset of the DBpedia snapshot from September 2021 was used as the target Knowledge Graph. QBLink answers provided as the titles of Wikipedia infoboxes can be easily mapped to DBpedia entity URIs - if the corresponding entities are present in DBpedia - since DBpedia is constructed through the extraction of information from Wikipedia infoboxes.QBLink, in its original format, is not directly applicable for Conversational Entity Retrieval from a Knowledge Graph (CER-KG) because knowledge graphs contain considerably less information than Wikipedia. A named entity serving as an answer to a QBLink query may not be present as an entity in DBpedia. To modify QBLink for CER over DBpedia, we implemented two filtering steps: 1) we removed all queries for which the wiki_page field is empty, or the answer cannot be mapped to a DBpedia entity or does not match to a Wikipedia page. 2) For the evaluation of a model with specific techniques for entity linking and candidate selection, we excluded queries with answers that do not belong to the set of candidate entities derived using that model.The original QBLink dataset files before filtering are:QBLink-train.jsonQBLink-dev.jsonQBLink-test.jsonAnd the final QBLink-KG files after filtering are:QBLink-Filtered-train.jsonQBLink-Filtered-dev.jsonQBLink-Filtered-test.jsonWe used below references to construct QBLink-KG:Ahmed Elgohary, Chen Zhao, and Jordan Boyd-Graber. 2018. A dataset and baselines for sequential open-domain question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1077–1083, Brussels, Belgium. Association for Computational Linguistics.https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2021-09Lehmann, Jens et al. ‘DBpedia – A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia’. 1 Jan. 2015 : 167 – 195.To give more details about QBLink-KG, please read our research paper:Zamiri, Mona, et al. "Benchmark and Neural Architecture for Conversational Entity Retrieval from a Knowledge Graph", The Web Conference 2024.

  14. Z

    RDF2Vec DBpedia inverse page rank split embeddings

    • data.niaid.nih.gov
    • explore.openaire.eu
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cochez, Michael (2020). RDF2Vec DBpedia inverse page rank split embeddings [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1320004
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Ponzetto, Simone Paulo
    Ristoski, Petar
    Paulheim, Heiko
    Cochez, Michael
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains the vectors from computing RDF2vec embeddings from a inverse page rank split weighted DBpedia 2016-04 graph.

    For each entity in the graph, the text file in the zip archive contains a line with the entity name and the embedded vector.

    The parameter settings for the embedding are as specified in the paper:

    Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Biased graph walks for RDF graph embeddings. In Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (WIMS '17). ACM, New York, NY, USA, Article 21, 12 pages. DOI: https://doi.org/10.1145/3102254.3102279

  15. Untitled Item

    • figshare.com
    zip
    Updated Mar 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huang Hao (2024). Untitled Item [Dataset]. http://doi.org/10.6084/m9.figshare.24769494.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 19, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Huang Hao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository includes all code and data for causal inference over the knowledge graph, it includes experiments over four datasets: the synthetic review dataset, the open review dataset, the subset of DBpedia related to the writer, the MIMIC-III (we don't offer the data due to confidential issue).

  16. o

    Kglove Dbpedia Page Rank Split Frequency Embeddings

    • explore.openaire.eu
    • data.niaid.nih.gov
    Updated Oct 21, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Cochez; Petar Ristoski; Simone Paulo Ponzetto; Heiko Paulheim (2017). Kglove Dbpedia Page Rank Split Frequency Embeddings [Dataset]. http://doi.org/10.5281/zenodo.1320169
    Explore at:
    Dataset updated
    Oct 21, 2017
    Authors
    Michael Cochez; Petar Ristoski; Simone Paulo Ponzetto; Heiko Paulheim
    Description

    This dataset contains the vectors from computing KGloVe embeddings from a Page Rank split frequency weighted DBpedia 2016-04 graph. For each entity in the graph, the text file in the zip archive contains a line with the entity name and the embedded vector. The parameter settings for the embedding are as specified in the paper: Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Global RDF Vector Space Embeddings. In The Semantic Web – ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21–25, 2017, Proceedings, Part I.

  17. QALD-9-Plus

    • figshare.com
    • paperswithcode.com
    • +1more
    txt
    Updated Dec 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksandr Perevalov; Andreas Both; Dennis Diefenbach; Ricardo Usbeck (2021). QALD-9-Plus [Dataset]. http://doi.org/10.6084/m9.figshare.16864273.v7
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 21, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Aleksandr Perevalov; Andreas Both; Dennis Diefenbach; Ricardo Usbeck
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    QALD-9-Plus is the dataset for Knowledge Graph Question Answering (KGQA) based on well-known QALD-9.QALD-9-Plus enables to train and test KGQA systems over DBpedia and Wikidata using questions in 8 different languages.Some of the questions have several alternative writings in particular languages which enables to evaluate the robustness of KGQA systems and train paraphrasing models.As the questions' translations were provided by native speakers, they are considered as "gold standard", therefore, machine translation tools can be trained and evaluated on the dataset.Please, see also the GitHub repository: https://github.com/Perevalov/qald_9_plus

  18. Z

    KGloVe DBpedia uniform embeddings

    • data.niaid.nih.gov
    • explore.openaire.eu
    Updated Jan 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cochez, Michael (2020). KGloVe DBpedia uniform embeddings [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1320147
    Explore at:
    Dataset updated
    Jan 21, 2020
    Dataset provided by
    Ponzetto, Simone Paulo
    Ristoski, Petar
    Paulheim, Heiko
    Cochez, Michael
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains the vectors from computing KGloVe embeddings from a uniformly weighted DBpedia 2016-04 graph.

    For each entity in the graph, the text file in the zip archive contains a line with the entity name and the embedded vector.

    The parameter settings for the embedding are as specified in the paper:

    Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Global RDF Vector Space Embeddings. In The Semantic Web – ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21–25, 2017, Proceedings, Part I.

  19. RDF2Vec DBpedia uniform embeddings

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Cochez; Michael Cochez; Petar Ristoski; Simone Paulo Ponzetto; Heiko Paulheim; Petar Ristoski; Simone Paulo Ponzetto; Heiko Paulheim (2020). RDF2Vec DBpedia uniform embeddings [Dataset]. http://doi.org/10.5281/zenodo.1318146
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michael Cochez; Michael Cochez; Petar Ristoski; Simone Paulo Ponzetto; Heiko Paulheim; Petar Ristoski; Simone Paulo Ponzetto; Heiko Paulheim
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains the vectors from computing RDF2vec embeddings from a uniformly weighted DBpedia 2016-04 graph.

    For each entity in the graph, the text file in the zip archive contains a line with the entity name and the embedded vector.

    The parameter settings for the embedding are as specified in the paper:

    Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Biased graph walks for RDF graph embeddings. In Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (WIMS '17). ACM, New York, NY, USA, Article 21, 12 pages. DOI: https://doi.org/10.1145/3102254.3102279

  20. DBkWik Plus Plus

    • figshare.com
    bin
    Updated Sep 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sven Hertling; Heiko Paulheim (2022). DBkWik Plus Plus [Dataset]. http://doi.org/10.6084/m9.figshare.20407864.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 29, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sven Hertling; Heiko Paulheim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Large knowledge graphs like DBpedia and YAGO are always based on the same source - namely Wikipedia. But there are more wikis that contain information about long-tail entities such as wiki hosting platforms like Fandom. In this paper, we present the approach and analysis of DBkWik++, a fused Knowledge Graph from thousands of wikis. A modified version of the DBpedia framework is applied to each wiki which results in many isolated Knowledge Graphs. With an incremental merge based approach, we reuse one-to-one matching systems to solve the multi source KG matching task. Based on this alignment we create a consolidated knowledge graph with more than 15 million instances.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Martin Pekár Christensen; Martin Pekár Christensen; Matteo Lissandrini; Matteo Lissandrini; Katja Hose; Katja Hose (2024). DBpedia RDF2Vec Graph Embeddings [Dataset]. http://doi.org/10.5281/zenodo.6384728
Organization logo

DBpedia RDF2Vec Graph Embeddings

Explore at:
pdf, zipAvailable download formats
Dataset updated
Jul 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Martin Pekár Christensen; Martin Pekár Christensen; Matteo Lissandrini; Matteo Lissandrini; Katja Hose; Katja Hose
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

DBpedia graph embeddings using RDF2Vec. RDF2Vec embedding generation code can be found here and is based on a publication by Portisch et al. [1].

The embeddings dataset consists of 200-dimensional vectors of DBpedia entities (from 1/9/2021).

Figure of cosine similarities between a selected set of DBpedia entities are provided in the dataset here.

Generating Embeddings

The code for generating these embeddings can be found here.

Run the run.sh script that wraps all the necessary commmands to generate embeddings

bash run.sh

The script downloads a set of DBpedia files, which are listed in dbpedia_files.txt. It then builds a Docker image and runs a container of that image that generates the embeddings for the DBpedia graph defined by the DBpedia files.

A folder files is created containing all the downloaded DBpedia files, and a folder embeddings/dbpedia is created containing the embeddings in vectors.txt along a set of random walk files.

Run Time of Embeddings Generation

Generating embeddings can take more than a day, but it depends on the number of DBpedia files chosen to be downloaded. Following are some basic run time statistics when embeddings are generated on a 64 GB RAM, 8 cores (AMD EPYC), 1 TB SSD, 1996.221 MHz machine.

  • Total: 1 day, 8 hours, 52 minutes, 41 seconds
  • Walk generation: 0 days, 7 minutes, 24 minutes, 36 seconds
  • Training: 1 day, 1 hour, 28 minutes, 5 seconds

Parameters Used

Here is listed the parameters used to generate the embeddings provided here:

  • Number of walks per entity: 100
  • Depth (hops) per walk: 4
  • Walk generation mode: RANDOM_WALKS_DUPLICATE_FREE
  • Threads: # of processors / 2
  • Training mode: sg
  • Embeddings vector dimension: 200
  • Minimum word2vec word count: 1
  • Sample rate: 0.0
  • Training window size: 5
  • Training epochs: 5
Search
Clear search
Close search
Google apps
Main menu