60 datasets found

Wikipedia Knowledge Graph dataset
zenodo.org
produccioncientifica.ugr.es
+1more
pdf, tsv
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenceslao Arroyo-Machado; Wenceslao Arroyo-Machado; Daniel Torres-Salinas; Daniel Torres-Salinas; Rodrigo Costas; Rodrigo Costas (2024). Wikipedia Knowledge Graph dataset [Dataset]. http://doi.org/10.5281/zenodo.6346900
Explore at:
tsv, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6346900
Dataset updated
Jul 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Wenceslao Arroyo-Machado; Wenceslao Arroyo-Machado; Daniel Torres-Salinas; Daniel Torres-Salinas; Rodrigo Costas; Rodrigo Costas
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Wikipedia is the largest and most read online free encyclopedia currently existing. As such, Wikipedia offers a large amount of data on all its own contents and interactions around them, as well as different types of open data sources. This makes Wikipedia a unique data source that can be analyzed with quantitative data science techniques. However, the enormous amount of data makes it difficult to have an overview, and sometimes many of the analytical possibilities that Wikipedia offers remain unknown. In order to reduce the complexity of identifying and collecting data on Wikipedia and expanding its analytical potential, after collecting different data from various sources and processing them, we have generated a dedicated Wikipedia Knowledge Graph aimed at facilitating the analysis, contextualization of the activity and relations of Wikipedia pages, in this case limited to its English edition. We share this Knowledge Graph dataset in an open way, aiming to be useful for a wide range of researchers, such as informetricians, sociologists or data scientists.

There are a total of 9 files, all of them in tsv format, and they have been built under a relational structure. The main one that acts as the core of the dataset is the page file, after it there are 4 files with different entities related to the Wikipedia pages (category, url, pub and page_property files) and 4 other files that act as "intermediate tables" making it possible to connect the pages both with the latter and between pages (page_category, page_url, page_pub and page_link files).

The document Dataset_summary includes a detailed description of the dataset.

Thanks to Nees Jan van Eck and the Centre for Science and Technology Studies (CWTS) for the valuable comments and suggestions.
h
wikipedia_knowledge_graph_en
huggingface.co
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes Schüth (2024). wikipedia_knowledge_graph_en [Dataset]. https://huggingface.co/datasets/Jotschi/wikipedia_knowledge_graph_en
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 3, 2024
Authors
Johannes Schüth
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Dataset Card for Wikipedia Knowledge Graph

The dataset contains 16_958_654 extracted ontologies from a subset of selected wikipedia articles.

Dataset Creation

The dataset was created via LLM processing a subset of the English Wikipedia 20231101.en dataset. The initial knowledge base dataset was used as a basis to extract the ontologies from. Pipeline: Wikipedia article → Chunking → Fact extraction (Knowledge base dataset) → Ontology extraction from facts → Ontologies… See the full description on the dataset page: https://huggingface.co/datasets/Jotschi/wikipedia_knowledge_graph_en.
Z
CaLiGraph - A Large-Scale Semantic Knowledge Graph compiled from Wikipedia...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jun 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heist, Nicolas (2023). CaLiGraph - A Large-Scale Semantic Knowledge Graph compiled from Wikipedia Categories and List Pages [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3484511
Explore at:
Dataset updated
Jun 25, 2023
Dataset provided by
Paulheim, Heiko
Heist, Nicolas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CaLiGraph is a large-scale semantic knowledge graph with a rich ontology which is compiled from the DBpedia ontology, and Wikipedia categories & list pages. For more information, visit http://caligraph.org

Information about uploaded files: (all files are b-zipped and in the n-triple format)

caligraph-metadata.nt.bz2 Metadata about the dataset which is described using void vocabulary.

caligraph-ontology.nt.bz2 Class definitions, property definitions, restrictions, and labels of the CaLiGraph ontology.

caligraph-ontology_dbpedia-mapping.nt.bz2 Mapping of classes and properties to the DBpedia ontology.

caligraph-ontology_provenance.nt.bz2 Provenance information about classes (i.e. which Wikipedia category or list page has been used to create this class).

caligraph-instances_types.nt.bz2 Definition of instances and (non-transitive) types.

caligraph-instances_transitive-types.nt.bz2 Transitive types for instances (can also be induced by a reasoner).

caligraph-instances_labels.nt.bz2 Labels for instances.

caligraph-instances_relations.nt.bz2 Relations between instances derived from the class restrictions of the ontology (can also be induced by a reasoner).

caligraph-instances_dbpedia-mapping.nt.bz2 Mapping of instances to respective DBpedia instances.

caligraph-instances_provenance.nt.bz2 Provenance information about instances (e.g. if the instance has been extracted from a Wikipedia list page).

dbpedia_caligraph-instances.nt.bz2 Additional instances of CaLiGraph that are not in DBpedia. ! This file is no part of CaLiGraph but should rather be used as an extension to DBpedia. The triples use the DBpedia namespace and can thus be used to directly extend DBpedia. !

dbpedia_caligraph-types.nt.bz2 Additional types of CaLiGraph that are not in DBpedia. ! This file is no part of CaLiGraph but should rather be used as an extension to DBpedia. The triples use the DBpedia namespace and can thus be used to directly extend DBpedia. !

dbpedia_caligraph-relations.nt.bz2 Additional relations of CaLiGraph that are not in DBpedia. ! This file is no part of CaLiGraph but should rather be used as an extension to DBpedia. The triples use the DBpedia namespace and can thus be used to directly extend DBpedia. !

Changelog

v3.1.1

Fixed an encoding issue in caligraph-ontology.nt.bz2

v3.1.0

Fixed several issues related to ontology consistency and structure

v3.0.0

Added functionality to group mentions of unknown entities into distinct entities

v2.1.0

Fixed error that lead to a class inheriting from a disjoint class

Introduced owl:ObjectProperty and owl:DataProperty instead of rdf:Property

Several cosmetic fixes

v2.0.2

Fixed incorrect formatting of some properties

v2.0.1

Better entity extraction and representation

Small cosmetic fixes

v2.0.0

Entity extraction from arbitrary tables and enumerations in Wikipedia pages

v1.4.0

BERT-based recognition of subject entities and improved language models from spaCy 3.0

v1.3.1

Fixed minor encoding errors and improved formatting

v1.3.0

CaLiGraph is now based on a recent version of Wikipedia and DBpedia from November 2020

v1.1.0

Improved the CaLiGraph type hierarchy

Many small bugfixes and improvements

v1.0.9

Additional alternative labels for CaLiGraph instances

v1.0.8

Small cosmetic changes to URIs to be closer to DBpedia URIs

v1.0.7

Mappings from CaLiGraph classes to DBpedia classes are now realised via rdfs:subClassOf instead of owl:equivalentClass

Entities are now URL-encoded to improve accessibility

v1.0.6

Fixed a bug in the ontology creation step that led to a substantially lower amount of sub-type relationships than actually exist. The new version provides a richer type hierarchy that also leads to an increased amount of types for resources.

v1.0.5

Fixed a bug that has declared CaLiGraph predicates as subclasses of owl:Predicate instead of being of the type owl:Predicate.
E
Wikidata
live.european-language-grid.eu
json
Updated Oct 28, 2012
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2012). Wikidata [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/7268
Explore at:
jsonAvailable download formats
Dataset updated
Oct 28, 2012
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.
o
KeySearchWiki
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Jun 15, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leila Feddoul; Frank Löffler; Sirko Schindler (2021). KeySearchWiki [Dataset]. http://doi.org/10.5281/zenodo.4955200
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4955200
Dataset updated
Jun 15, 2021
Authors
Leila Feddoul; Frank Löffler; Sirko Schindler
Description
KeySearchWiki is a dataset for evaluating keyword search systems over Wikidata. The dataset was automatically generated by leveraging Wikidata and Wikipedia set categories (e.g., Category:American television directors) as data sources for both relevant entities and queries. Relevant entities are gathered by carefully navigating the Wikipedia set categories hierarchy in all available languages. Furthermore, those categories are refined and combined to derive more complex queries. Detailed information about KeySearchWiki and its generation can be found on the Github page.
Z
WikiCausal Corpus for Evaluation of Causal Knowledge Graph Construction
data.niaid.nih.gov
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hassanzadeh, Oktie (2023). WikiCausal Corpus for Evaluation of Causal Knowledge Graph Construction [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7897995
Explore at:
Dataset updated
Jun 14, 2023
Dataset authored and provided by
Hassanzadeh, Oktie
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Documentation on the data format and how it can be used can be found on: https://github.com/IBM/wikicausal as well as our paper:

@unpublished{, author = {Oktie Hassanzadeh and Mark Feblowitz}, title = {{WikiCausal}: Corpus and Evaluation Framework for Causal Knowledge Graph Construction}, year = {2023}, doi = {10.5281/zenodo.7897996} }

Corpus derived from Wikipedia and Wikidata.

Refer to Wikipedia and Wikidata license and terms of use for more details:

Permission is granted to copy, distribute and/or modify Wikipedia's text under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported License and, unless otherwise noted, the GNU Free Documentation License, unversioned, with no invariant sections, front-cover texts, or back-cover texts.

A copy of the Creative Commons Attribution-ShareAlike 3.0 Unported License is included in the section entitled "Wikipedia:Text of Creative Commons Attribution-ShareAlike 3.0 Unported License"

A copy of the GNU Free Documentation License is included in the section entitled "GNU Free Documentation License".

Content on Wikipedia is covered by disclaimers.

THIS DATA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
R
RKD-Knowledge-Graph
rkd.triply.cc
application/n-quads +5
Updated Aug 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RKD (2025). RKD-Knowledge-Graph [Dataset]. https://rkd.triply.cc/rkd/RKD-Knowledge-Graph
Explore at:
application/sparql-results+json, ttl, application/n-quads, application/n-triples, jsonld, application/trigAvailable download formats
Dataset updated
Aug 18, 2025
Dataset authored and provided by
RKD
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
We manage unique archives, documentation and photographic material and the largest art historical library on Western art from the Late Middle Ages to the present, with the focus on Netherlandish art. Our collections cover not only paintings, drawings and sculptures, but also monumental art, modern media and design. The collections are present in both digital and analogue form (the latter in our study rooms).

This knowledge graph represents our collection as Linked Data, primarily using the CIDOC-CRM and LinkedArt vocabularies.
Wiki Sentences
kaggle.com
Updated Nov 15, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ved Prakash (2019). Wiki Sentences [Dataset]. https://www.kaggle.com/datasets/ved1104/wiki-sentences
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 15, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ved Prakash
Description
Dataset

This dataset was created by Ved Prakash

Contents
KeySearchWiki-experiments
zenodo.org
data.niaid.nih.gov
zip
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leila Feddoul; Leila Feddoul; Frank Löffler; Frank Löffler; Sirko Schindler; Sirko Schindler (2022). KeySearchWiki-experiments [Dataset]. http://doi.org/10.5281/zenodo.5761138
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5761138
Dataset updated
Feb 14, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Leila Feddoul; Leila Feddoul; Frank Löffler; Frank Löffler; Sirko Schindler; Sirko Schindler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Experiment results together with queries, runs, and relevance judgments produced in the context of evaluating different retrieval methods using the KeySearchWiki dataset.

Detailed information about KeySearchWiki and its generation can be found on the Github page.
Z
Wikidata5m - knowledge graph (inductive)
data.niaid.nih.gov
Updated Oct 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhaocheng Zhu (2021). Wikidata5m - knowledge graph (inductive) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5546386
Explore at:
Dataset updated
Oct 3, 2021
Dataset provided by
Zhiyuan Liu
Tianyu Gao
Juanzi Li
Zhaocheng Zhu
Zhengyan Zhang
Jian Tang
Xiaozhi Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Wikidata5m is a million-scale knowledge graph dataset with aligned corpus.This dataset integrates the Wikidata knowledge graph and Wikipedia pages. Each entity in Wikidata5m is described by a corresponding Wikipedia page, which enables the evaluation of link prediction over unseen entities.

This file contains the inductive split of Wikidata5m knowledge graph.
DBkWik Plus Plus
figshare.com
bin
Updated Sep 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sven Hertling; Heiko Paulheim (2022). DBkWik Plus Plus [Dataset]. http://doi.org/10.6084/m9.figshare.20407864.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20407864.v1
Dataset updated
Sep 29, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Sven Hertling; Heiko Paulheim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Large knowledge graphs like DBpedia and YAGO are always based on the same source - namely Wikipedia. But there are more wikis that contain information about long-tail entities such as wiki hosting platforms like Fandom. In this paper, we present the approach and analysis of DBkWik++, a fused Knowledge Graph from thousands of wikis. A modified version of the DBpedia framework is applied to each wiki which results in many isolated Knowledge Graphs. With an incremental merge based approach, we reuse one-to-one matching systems to solve the multi source KG matching task. Based on this alignment we create a consolidated knowledge graph with more than 15 million instances.
h
Wikidata5M-KG
huggingface.co
Updated Jul 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dingjun Wu (2025). Wikidata5M-KG [Dataset]. https://huggingface.co/datasets/Alphonse7/Wikidata5M-KG
Explore at:
Dataset updated
Jul 22, 2025
Authors
Dingjun Wu
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Wikidata5M-KG

Wikidata5M-KG is an open-domain knowledge graph constructed from Wikipedia and Wikidata. It contains approximately 4.6 million entities and 21 million triples. Wikidata5M-KG is built based on the Wikidata5M dataset.

📦 Contents wikidata5m_kg.tar.gz

This is the processed knowledge graph used in our experiments. It contains:

4,665,331 entities 810 relations 20,987,217 triples

After extraction, it yields a single file: wikidata5m_kg.jsonl, each… See the full description on the dataset page: https://huggingface.co/datasets/Alphonse7/Wikidata5M-KG.
e
Knowledge Graph - Dataset - B2FIND
b2find.eudat.eu
Updated Jun 30, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Knowledge Graph - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/7d68081e-21a5-5804-bff4-0638c871aa85
Explore at:
Dataset updated
Jun 30, 2018
Description
This Wikipedia entry describes the Knowledge Graph as a knowledge base by Google. It enhances the search engine's results by gathering information from a variety of sources. The entry consists of the history, description, a criticism, references and links to external sources. Modified: 2018-06-30 This Wikipedia article is deposited in EASY in order to assign a persistent identifier to it.
Z
Improving the Utility and Trustworthiness of Knowledge Graph Embeddings with...
data.niaid.nih.gov
zenodo.org
Updated Apr 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Koutra, Danai (2020). Improving the Utility and Trustworthiness of Knowledge Graph Embeddings with Calibration [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3738263
Explore at:
Dataset updated
Apr 2, 2020
Dataset provided by
Meij, Edgar
Safavi, Tara
Koutra, Danai
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains two public knowledge graph datasets used in our paper Improving the Utility of Knowledge Graph Embeddings with Calibration. Each dataset is described below.

Note that for our experiments we split each dataset randomly 5 times into 80/10/10 train/validation/test splits. We recommend that users of our data do the same to avoid (potentially) overfitting models to a single dataset split.

wikidata-authors

This dataset was extracted by querying the Wikidata API for facts about people categorized as "authors" or "writers" on Wikidata. Note that all head entities of triples are people (authors or writers), and all triples describe something about that person (e.g., their place of birth, their place of death, or their spouse). The knowledge graph has 23,887 entities, 13 relations, and 86,376 triples.

The files are as follows:

entities.tsv: A tab-separated file of all unique entities in the dataset. The fields are as follows:

eid: The unique Wikidata identifier of this entity. You can find the corresponding Wikidata page at https://www.wikidata.org/wiki/.

label: A human-readable label of this entity (extracted from Wikidata).

relations.tsv: A tab-separated file of all unique relations in the dataset. The fields are as follows:

rid: The unique Wikidata identifier of this relation. You can find the corresponding Wikidata page at https://www.wikidata.org/wiki/Property:.

label: A human-readable label of this relation (extracted from Wikidata).

triples.tsv: A tab-separated file of all triples in the dataset, in the form of , , .

fb15krr-linked

This dataset is an extended version of the FB15k+ dataset provided by [Xie et al IJCAI16]. It has been linked to Wikidata using Freebase MIDs (machine IDs) as keys; we discarded triples from the original dataset that contained entities that could not be linked to Wikidata. We also removed reverse relations following the procedure described by [Toutanova and Chen CVSC2015]. Finally, we removed existing triples labeled as False and added predicted triples labeled as True based on the crowdsourced annotations we obtained in our True or False Facts experiment (see our paper for details). The knowledge graph consists of 14,289 entities, 770 relations, and 272,385 triples.

The files are as follows:

entities.tsv: A tab-separated file of all unique entities in the dataset. The fields are as follows:

mid: The Freebase machine ID (MID) of this entity.

wiki: The corresponding unique Wikidata identifier of this entity. You can find the corresponding Wikidata page at https://www.wikidata.org/wiki/.

label: A human-readable label of this entity (extracted from Wikidata).

types: All hierarchical types of this entity, as provided by [Xie et al IJCAI16].

relations.tsv: A tab-separated file of all unique relations in the dataset. The fields are as follows:

label: The hierarchical Freebase label of this relation.

triples.tsv: A tab-separated file of all triples in the dataset, in the form of , , .
D
Knowledge Graph
phys-techsciences.datastations.nl
pdf, zip
Updated Jul 31, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wikipedia; Wikipedia (2018). Knowledge Graph [Dataset]. http://doi.org/10.17026/DANS-ZQA-URC7
Explore at:
zip(11279), pdf(246946)Available download formats
Unique identifier
https://doi.org/10.17026/DANS-ZQA-URC7
Dataset updated
Jul 31, 2018
Dataset provided by
DANS Data Station Physical and Technical Sciences
Authors
Wikipedia; Wikipedia
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This Wikipedia entry describes the Knowledge Graph as a knowledge base by Google. It enhances the search engine's results by gathering information from a variety of sources. The entry consists of the history, description, a criticism, references and links to external sources. Modified: 2018-06-30 This Wikipedia article is deposited in EASY in order to assign a persistent identifier to it.
f
QBLink-KG: QBLink Adapted to DBpedia Knowledge Graph
figshare.com
json
Updated Feb 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mona Zamiri; Yao Qiang; Fedor Nikolaev; Dongxiao Zhu; Alexander Kotov (2024). QBLink-KG: QBLink Adapted to DBpedia Knowledge Graph [Dataset]. http://doi.org/10.6084/m9.figshare.25256290.v3
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25256290.v3
Dataset updated
Feb 21, 2024
Dataset provided by
figshare
Authors
Mona Zamiri; Yao Qiang; Fedor Nikolaev; Dongxiao Zhu; Alexander Kotov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
QBLink-KG is a modified version of QBLink, which is a high-quality benchmark for evaluating conversational understanding of Wikipedia content.QBLink consists of sequences of up to three hand-crafted queries, with responses being single-named entities that match the titles of Wikipedia articles.For the QBLink-KG, the English subset of the DBpedia snapshot from September 2021 was used as the target Knowledge Graph. QBLink answers provided as the titles of Wikipedia infoboxes can be easily mapped to DBpedia entity URIs - if the corresponding entities are present in DBpedia - since DBpedia is constructed through the extraction of information from Wikipedia infoboxes.QBLink, in its original format, is not directly applicable for Conversational Entity Retrieval from a Knowledge Graph (CER-KG) because knowledge graphs contain considerably less information than Wikipedia. A named entity serving as an answer to a QBLink query may not be present as an entity in DBpedia. To modify QBLink for CER over DBpedia, we implemented two filtering steps: 1) we removed all queries for which the wiki_page field is empty, or the answer cannot be mapped to a DBpedia entity or does not match to a Wikipedia page. 2) For the evaluation of a model with specific techniques for entity linking and candidate selection, we excluded queries with answers that do not belong to the set of candidate entities derived using that model.The original QBLink dataset files before filtering are:QBLink-train.jsonQBLink-dev.jsonQBLink-test.jsonAnd the final QBLink-KG files after filtering are:QBLink-Filtered-train.jsonQBLink-Filtered-dev.jsonQBLink-Filtered-test.jsonWe used below references to construct QBLink-KG:Ahmed Elgohary, Chen Zhao, and Jordan Boyd-Graber. 2018. A dataset and baselines for sequential open-domain question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1077–1083, Brussels, Belgium. Association for Computational Linguistics.https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2021-09Lehmann, Jens et al. ‘DBpedia – A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia’. 1 Jan. 2015 : 167 – 195.To give more details about QBLink-KG, please read our research paper:Zamiri, Mona, et al. "Benchmark and Neural Architecture for Conversational Entity Retrieval from a Knowledge Graph", The Web Conference 2024.
e
WikiEvents Dataset from January 2020 to December 2022 - Dataset - B2FIND
b2find.eudat.eu
Updated Jan 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). WikiEvents Dataset from January 2020 to December 2022 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/c18ddb00-5770-5d39-ad6b-3f03eb17d699
Explore at:
Dataset updated
Jan 23, 2024
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
WikiEvents is a knowledge graph based dataset for NLP and event-related machine learning tasks. This dataset includes RDF data in JSON-LD about events between January 2020 and December 2022. It was extracted from the Wikipedia Current events portal, Wikidata, OpenStreetMaps Nominatim and Falcon 2.0. The extractor is available on GitHub under semantic-systems/current-events-to-kg. The RDF data for each month is split onto four graph modules each: The base graph module contains events, event summaries with references from named entities to Wikipedia articles. The ohg graph module with all one-hop graphs (ohg) around the referencend Wikidata entities. The osm graph module which contains spartial data from OpenStreetMap (OSM). The raw graph module containing the raw HTML objects of events and article infoboxes. This repository additionally includes two JSON files with training samples used for entity linking and event-related location extraction. They were created using queries to the WikiEvents dataset uploaded into this repository.
O
Data from: WikiWiki
opendatalab.com
zip
Updated Apr 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amazon Research (2023). WikiWiki [Dataset]. https://opendatalab.com/OpenDataLab/WikiWiki
Explore at:
zip(25410 bytes)Available download formats
Dataset updated
Apr 1, 2023
Dataset provided by
Amazon Research
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
WikiWiki is a dataset for understanding entities and their place in a taxonomy of knowledge—their types. It consists of entities and passages from 10M Wikipedia articles linked to the Wikidata knowledge graph with 41K types.
h
wikidata_descriptions
huggingface.co
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Masaki Sakata (2025). wikidata_descriptions [Dataset]. https://huggingface.co/datasets/masaki-sakata/wikidata_descriptions
Explore at:
Dataset updated
May 28, 2025
Authors
Masaki Sakata
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Wikidata Descriptions Dataset

wikidata_descriptions pairs English Wikipedia article titles (wiki_title) and their Wikidata IDs (qid) with the English "description" available in Wikidata.The corpus contains 26 205 entities. Wikidata descriptions are short, one-line summaries that concisely state what an entity is.They can be used as lightweight contextual information in entity linking, search, question answering, knowledge-graph completion and many other NLP / IR tasks.… See the full description on the dataset page: https://huggingface.co/datasets/masaki-sakata/wikidata_descriptions.
Uncovering the Semantics of Wikipedia Categories - Axioms and Assertions
zenodo.org
bz2, csv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Heist; Nicolas Heist; Heiko Paulheim; Heiko Paulheim (2020). Uncovering the Semantics of Wikipedia Categories - Axioms and Assertions [Dataset]. http://doi.org/10.5281/zenodo.3482775
Explore at:
bz2, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3482775
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nicolas Heist; Nicolas Heist; Heiko Paulheim; Heiko Paulheim
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Resulting axioms and assertions from applying the Cat2Ax approach to the DBpedia knowledge graph.
The methodology is described in the conference publication "N. Heist, H. Paulheim: Uncovering the Semantics of Wikipedia Categories, International Semantic Web Conference, 2019".

Facebook

Twitter

Click to copy link

Link copied

Cite

Wenceslao Arroyo-Machado; Wenceslao Arroyo-Machado; Daniel Torres-Salinas; Daniel Torres-Salinas; Rodrigo Costas; Rodrigo Costas (2024). Wikipedia Knowledge Graph dataset [Dataset]. http://doi.org/10.5281/zenodo.6346900

Wikipedia Knowledge Graph dataset

Explore at:

213 scholarly articles cite this dataset (View in Google Scholar)

tsv, pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6346900

Dataset updated

Jul 17, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Wenceslao Arroyo-Machado; Wenceslao Arroyo-Machado; Daniel Torres-Salinas; Daniel Torres-Salinas; Rodrigo Costas; Rodrigo Costas

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Wikipedia is the largest and most read online free encyclopedia currently existing. As such, Wikipedia offers a large amount of data on all its own contents and interactions around them, as well as different types of open data sources. This makes Wikipedia a unique data source that can be analyzed with quantitative data science techniques. However, the enormous amount of data makes it difficult to have an overview, and sometimes many of the analytical possibilities that Wikipedia offers remain unknown. In order to reduce the complexity of identifying and collecting data on Wikipedia and expanding its analytical potential, after collecting different data from various sources and processing them, we have generated a dedicated Wikipedia Knowledge Graph aimed at facilitating the analysis, contextualization of the activity and relations of Wikipedia pages, in this case limited to its English edition. We share this Knowledge Graph dataset in an open way, aiming to be useful for a wide range of researchers, such as informetricians, sociologists or data scientists.

There are a total of 9 files, all of them in tsv format, and they have been built under a relational structure. The main one that acts as the core of the dataset is the page file, after it there are 4 files with different entities related to the Wikipedia pages (category, url, pub and page_property files) and 4 other files that act as "intermediate tables" making it possible to connect the pages both with the latter and between pages (page_category, page_url, page_pub and page_link files).

The document Dataset_summary includes a detailed description of the dataset.

Thanks to Nees Jan van Eck and the Centre for Science and Technology Studies (CWTS) for the valuable comments and suggestions.

Clear search

Close search

Google apps

Main menu

Wikipedia Knowledge Graph dataset

wikipedia_knowledge_graph_en

CaLiGraph - A Large-Scale Semantic Knowledge Graph compiled from Wikipedia...

Wikidata

KeySearchWiki

WikiCausal Corpus for Evaluation of Causal Knowledge Graph Construction

RKD-Knowledge-Graph

Wiki Sentences

Dataset

Contents

KeySearchWiki-experiments

Wikidata5m - knowledge graph (inductive)

DBkWik Plus Plus

Wikidata5M-KG

Knowledge Graph - Dataset - B2FIND

Improving the Utility and Trustworthiness of Knowledge Graph Embeddings with...

Knowledge Graph

QBLink-KG: QBLink Adapted to DBpedia Knowledge Graph

WikiEvents Dataset from January 2020 to December 2022 - Dataset - B2FIND

Data from: WikiWiki

wikidata_descriptions

Uncovering the Semantics of Wikipedia Categories - Axioms and Assertions

Wikipedia Knowledge Graph dataset