15 datasets found
  1. h

    dbpedia-entity-generated-queries

    • huggingface.co
    Updated Aug 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BEIR (2022). dbpedia-entity-generated-queries [Dataset]. https://huggingface.co/datasets/BeIR/dbpedia-entity-generated-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 30, 2022
    Dataset authored and provided by
    BEIR
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for BEIR Benchmark

      Dataset Summary
    

    BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

    Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet
 See the full description on the dataset page: https://huggingface.co/datasets/BeIR/dbpedia-entity-generated-queries.

  2. DBpedia derived abstracts for text classification

    • zenodo.org
    csv
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mariano Rico; Mariano Rico (2024). DBpedia derived abstracts for text classification [Dataset]. http://doi.org/10.5281/zenodo.12783744
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mariano Rico; Mariano Rico
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The derived datasets obtained after processing the raw texts from the DBpedia dataset.

  3. P

    SimpleDBpediaQA Dataset

    • paperswithcode.com
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Azmy; Peng Shi; Jimmy Lin; Ihab Ilyas, SimpleDBpediaQA Dataset [Dataset]. https://paperswithcode.com/dataset/simpledbpediaqa
    Explore at:
    Authors
    Michael Azmy; Peng Shi; Jimmy Lin; Ihab Ilyas
    Description

    A new benchmark dataset for simple question answering over knowledge graphs that was created by mapping SimpleQuestions entities and predicates from Freebase to DBpedia.

  4. Z

    Billion Triple Challenge (BTC) 2019 Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KĂ€fer, Tobias (2020). Billion Triple Challenge (BTC) 2019 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2634587
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Hogan, Aidan
    KĂ€fer, Tobias
    Herrera, Jose Miguel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Billion Triple Challenge (BTC) 2019 Dataset is the result of a large-scale RDF crawl (accepting RDF/XML, Turtle and N-Triples) conducted from 2018/12/12 until 2019/01/11 using LDspider. The data are stored as quads where the fourth element encodes the location of the Web document from which the associated triple was parsed. The dataset contains 2,155,856,033 quads, collected from 2,641,253 RDF documents on 394 pay-level domains. Merging the data into one RDF graph results in 256,059,356 unique triples. These data (as quads or triples) contain 38,156 unique predicates and instances of 120,037 unique classes.

    If you would like to use this dataset as part of a research work, we would ask you to please consider citing our paper:

    JosĂ©-Miguel Herrera, Aidan Hogan and Tobias KĂ€fer. "BTC-2019: The 2019 Billion Triple Challenge Dataset ". In the Proceedings of the 18th International Semantic Web Conference (ISWC), Auckland, New Zealand, October 26–30, 2019 (Resources track).

    The dataset is published in three main parts:

    Quads: (*nq.gz): contains the quads retrieved during the crawl (N-Quads, GZipped). These data are divided into individual files for each of the top 100 pay-level-domains by number of quads contributed (btc2019-[domain]_000XX.nq.gz). While most domains have one file, larger domains are further split into parts (000XX) with approximately 150,000,000 quads each. Finally, quads from the 294 domains not in the top 100 are merged into one file: btc2019-other_00001.nq.gz.

    Triples: (btc2019-triples.nt.gz): contains the unique triples resulting from taking all quads, dropping the fourth element (indicating the location of the source document) and computing the unique triples.

    VoID (void.nt): contains a VoID file offering statistics about the dataset.

    For parsing the files, we recommend a streaming parser, such as Raptor, RDF4j/Rio, or NxParser.

    The data are sourced from 2,641,253 RDF documents. The top-10 pay-level-domains in terms of documents contributed are:

    dbpedia.org 162,117 documents (6.14%)

    loc.gov 150,091 documents (5.68%)

    bnf.fr 146,186 documents (5.53%)

    sudoc.fr 144,877 documents (5.49%)

    theses.fr 141,228 documents (5.35%)

    wikidata.org 141,207 documents (5.35%)

    linkeddata.es 130,459 documents (4.94%)

    getty.edu 130,398 documents (4.94%)

    fao.org 92,838 documents (3.51%)

    ontobee.org 92,812 documents (3.51%)

    The data contain 2,155,856,033 quads. The top-10 pay-level-domains in terms of quads contributed are:

    wikidata.org 2,006,338,975 quads (93.06%)

    dbpedia.org 36,686,161 quads (1.70%)

    idref.fr 22,013,225 quads (1.02%)

    bnf.fr 12,618,155 quads (0.59%)

    getty.edu 7,453,134 quads (0.35%)

    sudoc.fr 7,176,301 quads (0.33%)

    loc.gov 6,725,390 quads (0.31%)

    linkeddata.es 6,485,114 quads (0.30%)

    theses.fr 4,820,874 quads (0.22%)

    ontologycentral.com 4,633,947 quads (0.21%)

    The data contain 256,059,356 unique triples. The top-10 pay-level-domains in terms of unique triples contributed are:

    wikidata.org 133,535,555 triples (52.15%)

    dbpedia.org 32,981,420 triples (12.88%)

    idref.fr 16,820,681 triples (6.57%)

    bnf.fr 11,769,268 triples (4.60%)

    getty.edu 6,571,525 triples (2.57%)

    linkeddata.es 5,898,762 triples (2.30%)

    loc.gov 5,362,064 triples (2.09%)

    sudoc.fr 4,972,647 triples (1.94%)

    ontologycentral.com 4,471,962 triples (1.75%)

    theses.fr 4,095,897 triples (1.60%)

    If you wish to download all N-Quads files, the following may be useful to copy and paste in Unix:

    wget https://zenodo.org/record/2634588/files/btc2019-acropolis.org.uk_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-aksw.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-babelnet.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-bbc.co.uk_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-berkeleybop.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-bibliotheken.nl_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-bl.uk_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-bne.es_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-bnf.fr_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-camera.it_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-cervantesvirtual.com_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-chemspider.com_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-cnr.it_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-comicmeta.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-crossref.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-cvut.cz_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-d-nb.info_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-datacite.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-dbpedia.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-dbtune.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-drugbank.ca_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-ebi.ac.uk_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-ebu.ch_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-ebusiness-unibw.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-edamontology.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-europa.eu_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-fao.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-gbv.de_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-geonames.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-geospecies.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-geovocab.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-gesis.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-getty.edu_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-github.io_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-githubusercontent.com_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-glottolog.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-iconclass.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-idref.fr_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-iflastandards.info_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-ign.fr_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-iptc.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-kanzaki.com_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-kasei.us_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-kit.edu_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-kjernsmo.net_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-korrekt.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-kulturarvsdata.se_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-kulturnav.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-l3s.de_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-lehigh.edu_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-lexvo.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-linkeddata.es_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-linkedopendata.gr_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-linkedresearch.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-loc.gov_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-lu.se_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-mcu.es_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-medra.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-myexperiment.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-ndl.go.jp_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-nih.gov_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-nobelprize.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-okfn.gr_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-ontobee.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-ontologycentral.com_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-openei.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-openlibrary.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-orcid.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-ordnancesurvey.co.uk_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-oszk.hu_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-other_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-persee.fr_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-pokepedia.fr_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-princeton.edu_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-productontology.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-rdaregistry.info_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-rdvocab.info_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-reegle.info_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-rhiaro.co.uk_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-schema.org_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-sf.net_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-simia.net_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-sti2.at_00001.nq.gz wget https://zenodo.org/record/2634588/files/btc2019-stoa.org_00001.nq.gz wget

  5. f

    QBLink-KG: QBLink Adapted to DBpedia Knowledge Graph

    • figshare.com
    json
    Updated Feb 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mona Zamiri; Yao Qiang; Fedor Nikolaev; Dongxiao Zhu; Alexander Kotov (2024). QBLink-KG: QBLink Adapted to DBpedia Knowledge Graph [Dataset]. http://doi.org/10.6084/m9.figshare.25256290.v3
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Feb 21, 2024
    Dataset provided by
    figshare
    Authors
    Mona Zamiri; Yao Qiang; Fedor Nikolaev; Dongxiao Zhu; Alexander Kotov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    QBLink-KG is a modified version of QBLink, which is a high-quality benchmark for evaluating conversational understanding of Wikipedia content.QBLink consists of sequences of up to three hand-crafted queries, with responses being single-named entities that match the titles of Wikipedia articles.For the QBLink-KG, the English subset of the DBpedia snapshot from September 2021 was used as the target Knowledge Graph. QBLink answers provided as the titles of Wikipedia infoboxes can be easily mapped to DBpedia entity URIs - if the corresponding entities are present in DBpedia - since DBpedia is constructed through the extraction of information from Wikipedia infoboxes.QBLink, in its original format, is not directly applicable for Conversational Entity Retrieval from a Knowledge Graph (CER-KG) because knowledge graphs contain considerably less information than Wikipedia. A named entity serving as an answer to a QBLink query may not be present as an entity in DBpedia. To modify QBLink for CER over DBpedia, we implemented two filtering steps: 1) we removed all queries for which the wiki_page field is empty, or the answer cannot be mapped to a DBpedia entity or does not match to a Wikipedia page. 2) For the evaluation of a model with specific techniques for entity linking and candidate selection, we excluded queries with answers that do not belong to the set of candidate entities derived using that model.The original QBLink dataset files before filtering are:QBLink-train.jsonQBLink-dev.jsonQBLink-test.jsonAnd the final QBLink-KG files after filtering are:QBLink-Filtered-train.jsonQBLink-Filtered-dev.jsonQBLink-Filtered-test.jsonWe used below references to construct QBLink-KG:Ahmed Elgohary, Chen Zhao, and Jordan Boyd-Graber. 2018. A dataset and baselines for sequential open-domain question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1077–1083, Brussels, Belgium. Association for Computational Linguistics.https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2021-09Lehmann, Jens et al. ‘DBpedia – A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia’. 1 Jan. 2015 : 167 – 195.To give more details about QBLink-KG, please read our research paper:Zamiri, Mona, et al. "Benchmark and Neural Architecture for Conversational Entity Retrieval from a Knowledge Graph", The Web Conference 2024.

  6. e

    DBpedia.fr det franske kapitel af DBpedia

    • data.europa.eu
    sparql, turtle
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabien Gandon, DBpedia.fr det franske kapitel af DBpedia [Dataset]. https://data.europa.eu/data/datasets/53699212a3a729239d203e16?locale=da
    Explore at:
    sparql, turtleAvailable download formats
    Dataset authored and provided by
    Fabien Gandon
    License

    http://www.opendefinition.org/licenses/cc-by-sahttp://www.opendefinition.org/licenses/cc-by-sa

    Area covered
    Frankrig, Fransk
    Description

    DBpĂ©dia.fr eller DBpĂ©dia en français er det fransksprogede kapitel i DBpedia, det er en del af DBpedias internationaliseringsindsats, hvis formĂ„l er at opretholde strukturerede data udvundet fra forskellige kapitler af Wikipedia. Udviklingen af DBpedia pĂ„ fransk sker inden for rammerne af SĂ©manticpĂ©dia-platformen, hvis partnere er: — team Wimmics d'Inria — Kultur- og kommunikationsministeriet — og foreningen WikimĂ©dia France

  7. e

    DBpedia.fr o capĂ­tulo francĂȘs da DBpedia

    • data.europa.eu
    sparql, turtle
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabien Gandon, DBpedia.fr o capĂ­tulo francĂȘs da DBpedia [Dataset]. https://data.europa.eu/data/datasets/53699212a3a729239d203e16?locale=pt
    Explore at:
    sparql, turtleAvailable download formats
    Dataset authored and provided by
    Fabien Gandon
    License

    http://www.opendefinition.org/licenses/cc-by-sahttp://www.opendefinition.org/licenses/cc-by-sa

    Description

    DBpĂ©dia.fr ou DBpĂ©dia en français Ă© o capĂ­tulo em lĂ­ngua francesa da DBpedia, faz parte do esforço de internacionalização da DBpedia, cujo objetivo Ă© manter dados estruturados extraĂ­dos de vĂĄrios capĂ­tulos da WikipĂ©dia. O desenvolvimento da DBpedia em francĂȘs Ă© realizado no Ăąmbito da plataforma SĂ©manticpĂ©dia cujos parceiros sĂŁo: — equipa Wimmics d’Inria — MinistĂ©rio da Cultura e da Comunicação — e a associação WikimĂ©dia France

  8. f

    RDF datasets for testing keyword search algorithms

    • figshare.com
    application/gzip
    Updated Dec 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angelo Batista Neves JĂșnior; Luiz AndrĂ© Portes Paes Leme (2020). RDF datasets for testing keyword search algorithms [Dataset]. http://doi.org/10.6084/m9.figshare.11347676.v3
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Dec 14, 2020
    Dataset provided by
    figshare
    Authors
    Angelo Batista Neves JĂșnior; Luiz AndrĂ© Portes Paes Leme
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    This repository contains five RDF datasets used to test keyword search algorithms: 1) DBpedia, 2) IMDb, and 3) Mondial, 4) LUBM and 5) BSBM.

  9. f

    The number of intersections for DBpedia-Yago.

    • figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernando Benites; Svenja Simon; Elena Sapozhnikova (2023). The number of intersections for DBpedia-Yago. [Dataset]. http://doi.org/10.1371/journal.pone.0084475.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Fernando Benites; Svenja Simon; Elena Sapozhnikova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The number of intersections for DBpedia-Yago.

  10. e

    DBpedia.fr Prantsuse peatĂŒkk DBpedia

    • data.europa.eu
    sparql, turtle
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabien Gandon (2025). DBpedia.fr Prantsuse peatĂŒkk DBpedia [Dataset]. https://data.europa.eu/data/datasets/53699212a3a729239d203e16?locale=et
    Explore at:
    sparql, turtleAvailable download formats
    Dataset updated
    Apr 10, 2025
    Dataset authored and provided by
    Fabien Gandon
    License

    http://www.opendefinition.org/licenses/cc-by-sahttp://www.opendefinition.org/licenses/cc-by-sa

    Area covered
    Prantsusmaa, Prantsuse keel
    Description

    DBpĂ©dia.fr vĂ”i DBpĂ©dia en français on DBpedia prantsuskeelne peatĂŒkk, see on osa DBpedia rahvusvahelistumise jĂ”upingutustest, mille eesmĂ€rk on sĂ€ilitada struktureeritud andmeid, mis on vĂ”etud Wikipedia erinevatest peatĂŒkkidest. DBpedia arendamine prantsuse keeles toimub SĂ©manticpĂ©dia platvormi raames, mille partnerid on: – meeskond Wimmics d’Inria – Kultuuri- ja kommunikatsiooniministeerium

    – ja ĂŒhing WikimĂ©dia France

  11. Self-contained ground-truths for cross-domain linkage

    • figshare.com
    zip
    Updated Apr 28, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mayank Kejriwal (2016). Self-contained ground-truths for cross-domain linkage [Dataset]. http://doi.org/10.6084/m9.figshare.3204325.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 28, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Mayank Kejriwal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cross-domain knowledge bases such as DBpedia, Freebase and YAGO have emerged as encyclopedic hubs in the Web of Linked Data. Despite enabling several practical applications in the Semantic Web, the large-scale, schema-free nature of such graphs often precludes research groups from employing them widely as evaluation test cases for entity resolution and instance-based ontology alignment applications. Although the ground-truth linkages between the three knowledge bases above are available, they are not amenable to resource-limited applications. One reason is that the ground-truth files are not self-contained, meaning that a researcher must usually perform a series of expensive joins (typically in MapReduce) to obtain usable information sets. We constructed this resource by uploading several publicly licensed data resources to the public cloud and used simple Hadoop clusters to compile, and make accessible, three cross-domain self-contained test cases involving linked instances from DBpedia, Freebase and YAGO. Self-containment is enabled by virtue of a simple NoSQL JSON-like serialization format. Potential applications for these resources, particularly related to testing transfer learning research hypotheses, are described in more detail in a paper submission in the resource track at ISWC 2016.

  12. P

    WikipediaGS Dataset

    • paperswithcode.com
    Updated Jul 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vasilis Efthymiou; Oktie Hassanzadeh; Mariano Rodriguez-Muro; Vassilis Christophides (2022). WikipediaGS Dataset [Dataset]. https://paperswithcode.com/dataset/wikipediags
    Explore at:
    Dataset updated
    Jul 4, 2022
    Authors
    Vasilis Efthymiou; Oktie Hassanzadeh; Mariano Rodriguez-Muro; Vassilis Christophides
    Description

    The WikipediaGS dataset was created by extracting Wikipedia tables from Wikipedia pages. It consists of 485,096 tables which were annotated with DBpedia entities for the Cell Entity Annotation (CEA) task.

    Additionally, a subset of these tables was annotated by Chen et al. for the Column Type Annotation (CTA) task and includes 604 tables, where selected columns were annotated using DBpedia types. This subset is available for download at their official Github repository.

    The table below shows the number of annotated cells/columns for each task and the number of different classes used for the annotation.

    AnnotationsClasses
    CEA4,453,3291,222,358
    CTA62031
  13. P

    WebNLG Dataset

    • paperswithcode.com
    Updated Jun 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Claire Gardent; Anastasia Shimorina; Shashi Narayan; Laura Perez-Beltrachini (2021). WebNLG Dataset [Dataset]. https://paperswithcode.com/dataset/webnlg
    Explore at:
    Dataset updated
    Jun 9, 2021
    Authors
    Claire Gardent; Anastasia Shimorina; Shashi Narayan; Laura Perez-Beltrachini
    Description

    The WebNLG corpus comprises of sets of triplets describing facts (entities and relations between them) and the corresponding facts in form of natural language text. The corpus contains sets with up to 7 triplets each along with one or more reference texts for each set. The test set is split into two parts: seen, containing inputs created for entities and relations belonging to DBpedia categories that were seen in the training data, and unseen, containing inputs extracted for entities and relations belonging to 5 unseen categories.

    Initially, the dataset was used for the WebNLG natural language generation challenge which consists of mapping the sets of triplets to text, including referring expression generation, aggregation, lexicalization, surface realization, and sentence segmentation. The corpus is also used for a reverse task of triplets extraction.

    Versioning history of the dataset can be found here.

    It's also available here: https://huggingface.co/datasets/web_nlg Note: "The v3 release (release_v3.0_en, release_v3.0_ru) for the WebNLG2020 challenge also supports a semantic parsing task."

  14. P

    QALD-9-Plus Dataset

    • paperswithcode.com
    • opendatalab.com
    • +1more
    Updated Jan 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksandr Perevalov; Dennis Diefenbach; Ricardo Usbeck; Andreas Both (2022). QALD-9-Plus Dataset [Dataset]. https://paperswithcode.com/dataset/qald-9-plus
    Explore at:
    Dataset updated
    Jan 30, 2022
    Authors
    Aleksandr Perevalov; Dennis Diefenbach; Ricardo Usbeck; Andreas Both
    Description

    QALD-9-Plus Dataset Description QALD-9-Plus is the dataset for Knowledge Graph Question Answering (KGQA) based on well-known QALD-9.

    QALD-9-Plus enables to train and test KGQA systems over DBpedia and Wikidata using questions in 9 different languages: English, German, Russian, French, Armenian, Belarusian, Lithuanian, Bashkir, and Ukrainian.

    Some of the questions have several alternative writings in particular languages which enables to evaluate the robustness of KGQA systems and train paraphrasing models.

    As the questions' translations were provided by native speakers, they are considered as "gold standard", therefore, machine translation tools can be trained and evaluated on the dataset.

    Dataset Statistics | | en | de | fr | ru | uk | lt | be | ba | hy | # questions DBpedia | # questions Wikidata | |-------|:---:|:---:|:--:|:----:|:---:|:---:|:---:|:---:|:--:|:-----------:|:-----------:| | Train | 408 | 543 | 260 | 1203 | 447 | 468 | 441 | 284 | 80 | 408 | 371 | | Test | 150 | 176 | 26 | 348 | 176 | 186 | 155 | 117 | 20 | 150 | 136 |

    Given the numbers, it is obvious that some of the languages are covered more than once i.e., there is more than one translation for a particular question. For example, there are 1203 Russian translations available while only 408 unique questions exist in the training subset (i.e., 2.9 Russian translations per one question). The availability of such parallel corpora enables the researchers, developers and other dataset users to address the paraphrasing task.

  15. Z

    Automatically Extracted SHACL Shapes for WikiData, DBpedia, YAGO-4, and LUBM...

    • data.niaid.nih.gov
    Updated Feb 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rabbani, Kashif (2023). Automatically Extracted SHACL Shapes for WikiData, DBpedia, YAGO-4, and LUBM & Associated Coverage Statistics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5958985
    Explore at:
    Dataset updated
    Feb 3, 2023
    Dataset provided by
    Rabbani, Kashif
    Lissandrini, Matteo
    Hose, Katja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The uploaded datasets contain automatically extracted SHACL shapes for the following datasets:

    WikiData (the truthy dump from September 2021 filtered by removing non-English strings) [1]

    DBpedia [2]

    YAGO-4 [3]

    LUBM (scale factor 500) [4]

    The validating shapes for these datasets are generated by a program that parses the corresponding RDF files (in .nt format). The extracted shapes encode various SHACL constraints, e.g., sh:minCount, sh:path, sh:class, sh:datatype etc. For each shape we encode coverage in terms of number of entities satisfying such shape, this information is encoded using the void:entities predicate.

    We have provided as executable Jar file the program we developed to extract these SHACL shapes. More details about the datasets used to extract these shapes and how to run the Jar are available on our GitHub repository https://github.com/dkw-aau/qse.

    Read more about our Quality Shapes Extraction (QSE) tool on our website https://relweb.cs.aau.dk/qse/

    [1] Vrandečić, Denny, and Markus Krötzsch. "Wikidata: a free collaborative knowledgebase." Communications of the ACM 57.10 (2014): 78-85.

    [2] Auer, Sören, et al. "Dbpedia: A nucleus for a web of open data." The semantic web. Springer, Berlin, Heidelberg, 2007. 722-735.

    [3] Pellissier Tanon, Thomas, Gerhard Weikum, and Fabian Suchanek. "Yago 4: A reason-able knowledge base." European Semantic Web Conference. Springer, Cham, 2020.

    [4] Guo, Yuanbo, Zhengxiang Pan, and Jeff Heflin. "LUBM: A benchmark for OWL knowledge base systems." Journal of Web Semantics 3.2-3 (2005): 158-182.

  16. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
BEIR (2022). dbpedia-entity-generated-queries [Dataset]. https://huggingface.co/datasets/BeIR/dbpedia-entity-generated-queries

dbpedia-entity-generated-queries

BEIR Benchmark

BeIR/dbpedia-entity-generated-queries

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 30, 2022
Dataset authored and provided by
BEIR
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset Card for BEIR Benchmark

  Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet
 See the full description on the dataset page: https://huggingface.co/datasets/BeIR/dbpedia-entity-generated-queries.

Search
Clear search
Close search
Google apps
Main menu