2 datasets found
  1. o

    Data from: DBpedia RDF2Vec Graph Embeddings

    • explore.openaire.eu
    • data.niaid.nih.gov
    Updated Jan 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Pekár Christensen; Matteo Lissandrini; Katja Hose (2022). DBpedia RDF2Vec Graph Embeddings [Dataset]. http://doi.org/10.5281/zenodo.6376306
    Explore at:
    Dataset updated
    Jan 1, 2022
    Authors
    Martin Pekár Christensen; Matteo Lissandrini; Katja Hose
    Description

    DBpedia graph embeddings using RDF2Vec. RDF2Vec embedding generation code can be found here and is based on a publication by Portisch et al. [1]. The embeddings dataset consists of 200-dimensional vectors of DBpedia entities (from 1/9/2021). Generating Embeddings The code for generating these embeddings can be found here. Run the run.sh script that wraps all the necessary commmands to generate embeddings bash run.sh The script downloads a set of DBpedia files, which are listed in dbpedia_files.txt. It then builds a Docker image and runs a container of that image that generates the embeddings for the DBpedia graph defined by the DBpedia files. A folder files is created containing all the downloaded DBpedia files, and a folder embeddings/dbpedia is created containing the embeddings in vectors.txt along a set of random walk files. Run Time of Embeddings Generation Generating embeddings can take more than a day, but it depends on the number of DBpedia files chosen to be downloaded. Following are some basic run time statistics when embeddings are generated on a 64 GB RAM, 8 cores (AMD EPYC), 1 TB SSD, 1996.221 MHz machine. Total: 1 day, 8 hours, 52 minutes, 41 seconds Walk generation: 0 days, 7 minutes, 24 minutes, 36 seconds Training: 1 day, 1 hour, 28 minutes, 5 seconds Parameters Used Here is listed the parameters used to generate the embeddings provided here: Number of walks per entity: 100 Depth (hops) per walk: 4 Walk generation mode: RANDOM_WALKS_DUPLICATE_FREE Threads: # of processors / 2 Training mode: sg Embeddings vector dimension: 200 Minimum word2vec word count: 1 Sample rate: 0.0 Training window size: 5 Training epochs: 5 {"references": ["Portisch, J., Hladik, M. and Paulheim, H., 2020. RDF2Vec Light--A Lightweight Approach for Knowledge Graph Embeddings. arXiv preprint arXiv:2009.07659."]}

  2. DBpedia RDF2Vec Graph Embeddings

    • zenodo.org
    pdf, zip
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Pekár Christensen; Martin Pekár Christensen; Matteo Lissandrini; Matteo Lissandrini; Katja Hose; Katja Hose (2024). DBpedia RDF2Vec Graph Embeddings [Dataset]. http://doi.org/10.5281/zenodo.6384728
    Explore at:
    pdf, zipAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Martin Pekár Christensen; Martin Pekár Christensen; Matteo Lissandrini; Matteo Lissandrini; Katja Hose; Katja Hose
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DBpedia graph embeddings using RDF2Vec. RDF2Vec embedding generation code can be found here and is based on a publication by Portisch et al. [1].

    The embeddings dataset consists of 200-dimensional vectors of DBpedia entities (from 1/9/2021).

    Figure of cosine similarities between a selected set of DBpedia entities are provided in the dataset here.

    Generating Embeddings

    The code for generating these embeddings can be found here.

    Run the run.sh script that wraps all the necessary commmands to generate embeddings

    bash run.sh

    The script downloads a set of DBpedia files, which are listed in dbpedia_files.txt. It then builds a Docker image and runs a container of that image that generates the embeddings for the DBpedia graph defined by the DBpedia files.

    A folder files is created containing all the downloaded DBpedia files, and a folder embeddings/dbpedia is created containing the embeddings in vectors.txt along a set of random walk files.

    Run Time of Embeddings Generation

    Generating embeddings can take more than a day, but it depends on the number of DBpedia files chosen to be downloaded. Following are some basic run time statistics when embeddings are generated on a 64 GB RAM, 8 cores (AMD EPYC), 1 TB SSD, 1996.221 MHz machine.

    • Total: 1 day, 8 hours, 52 minutes, 41 seconds
    • Walk generation: 0 days, 7 minutes, 24 minutes, 36 seconds
    • Training: 1 day, 1 hour, 28 minutes, 5 seconds

    Parameters Used

    Here is listed the parameters used to generate the embeddings provided here:

    • Number of walks per entity: 100
    • Depth (hops) per walk: 4
    • Walk generation mode: RANDOM_WALKS_DUPLICATE_FREE
    • Threads: # of processors / 2
    • Training mode: sg
    • Embeddings vector dimension: 200
    • Minimum word2vec word count: 1
    • Sample rate: 0.0
    • Training window size: 5
    • Training epochs: 5
  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Martin Pekár Christensen; Matteo Lissandrini; Katja Hose (2022). DBpedia RDF2Vec Graph Embeddings [Dataset]. http://doi.org/10.5281/zenodo.6376306

Data from: DBpedia RDF2Vec Graph Embeddings

Related Article
Explore at:
22 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jan 1, 2022
Authors
Martin Pekár Christensen; Matteo Lissandrini; Katja Hose
Description

DBpedia graph embeddings using RDF2Vec. RDF2Vec embedding generation code can be found here and is based on a publication by Portisch et al. [1]. The embeddings dataset consists of 200-dimensional vectors of DBpedia entities (from 1/9/2021). Generating Embeddings The code for generating these embeddings can be found here. Run the run.sh script that wraps all the necessary commmands to generate embeddings bash run.sh The script downloads a set of DBpedia files, which are listed in dbpedia_files.txt. It then builds a Docker image and runs a container of that image that generates the embeddings for the DBpedia graph defined by the DBpedia files. A folder files is created containing all the downloaded DBpedia files, and a folder embeddings/dbpedia is created containing the embeddings in vectors.txt along a set of random walk files. Run Time of Embeddings Generation Generating embeddings can take more than a day, but it depends on the number of DBpedia files chosen to be downloaded. Following are some basic run time statistics when embeddings are generated on a 64 GB RAM, 8 cores (AMD EPYC), 1 TB SSD, 1996.221 MHz machine. Total: 1 day, 8 hours, 52 minutes, 41 seconds Walk generation: 0 days, 7 minutes, 24 minutes, 36 seconds Training: 1 day, 1 hour, 28 minutes, 5 seconds Parameters Used Here is listed the parameters used to generate the embeddings provided here: Number of walks per entity: 100 Depth (hops) per walk: 4 Walk generation mode: RANDOM_WALKS_DUPLICATE_FREE Threads: # of processors / 2 Training mode: sg Embeddings vector dimension: 200 Minimum word2vec word count: 1 Sample rate: 0.0 Training window size: 5 Training epochs: 5 {"references": ["Portisch, J., Hladik, M. and Paulheim, H., 2020. RDF2Vec Light--A Lightweight Approach for Knowledge Graph Embeddings. arXiv preprint arXiv:2009.07659."]}

Search
Clear search
Close search
Google apps
Main menu