100+ datasets found

wikidata-all
huggingface.co
Updated Mar 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wikimedia Movement (2024). wikidata-all [Dataset]. https://huggingface.co/datasets/Wikimedians/wikidata-all
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 13, 2024
Dataset provided by
Wikimedia movementhttps://wikimedia.org/
Authors
Wikimedia Movement
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Wikidata - All Entities

This Hugging Face Data Set contains the entirety of Wikidata as of the date listed below. Wikidata is a freely licensed structured knowledge graph following the wiki model of user contributions. If you build on this data please consider contributing back to Wikidata. For more on the size and other statistics of Wikidata, see: Special:Statistics. Current Dump as of: 2024-03-04

Original Source

The data contained in this repository is retrieved… See the full description on the dataset page: https://huggingface.co/datasets/Wikimedians/wikidata-all.
h
wikidata-parallel-descriptions-en-ja
huggingface.co
Updated May 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
elanmitsua (2024). wikidata-parallel-descriptions-en-ja [Dataset]. https://huggingface.co/datasets/Mitsua/wikidata-parallel-descriptions-en-ja
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 20, 2024
Authors
elanmitsua
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Wikidata parallel descriptions en-ja

Parallel corpus for machine translation generated from wikidata dump (2024-05-06). Currently we processed only English/Japanese pair. The jsonl file is ready-to-train by Hugging Face transformers trainer for translation tasks.

Dataset Details

https://www.wikidata.org/wiki/Wikidata:Database_download

Dataset Creation

As Wikidata description field does not represent exact direct translation, filtering is required for… See the full description on the dataset page: https://huggingface.co/datasets/Mitsua/wikidata-parallel-descriptions-en-ja.
Wikidata Persons in Relevant Categories
opensanctions.org
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wikidata (2025). Wikidata Persons in Relevant Categories [Dataset]. https://www.opensanctions.org/datasets/wd_categories/
Explore at:
Dataset updated
Nov 21, 2025
Dataset authored and provided by
Wikidata//wikidata.org/
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Category-based imports from Wikidata, the structured data version of Wikipedia.
Wikidata Entities of Interest
opensanctions.org
csv
Updated Dec 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wikidata (2024). Wikidata Entities of Interest [Dataset]. https://www.opensanctions.org/datasets/wd_curated/
Explore at:
csvAvailable download formats
Dataset updated
Dec 6, 2024
Dataset authored and provided by
Wikidata//wikidata.org/
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Persons of interest profiles from Wikidata, the structured data version of Wikipedia.
a
Wikidata PageRank
danker.s3.amazonaws.com
Updated Nov 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreas Thalhammer (2025). Wikidata PageRank [Dataset]. https://danker.s3.amazonaws.com/index.html
Explore at:
tsv, application/n-triples, application/vnd.hdt, ttlAvailable download formats
Dataset updated
Nov 13, 2025
Authors
Andreas Thalhammer
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Regularly published dataset of PageRank scores for Wikidata entities. The underlying link graph is formed by a union of all links accross all Wikipedia language editions. Computation is performed Andreas Thalhammer with 'danker' available at https://github.com/athalhammer/danker . If you find the downloads here useful please feel free to leave a GitHub ⭐ at the repository and buy me a ☕ https://www.buymeacoffee.com/thalhamm
b
Wikidata Property
bioregistry.io
semantic.farm
Updated Nov 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Wikidata Property [Dataset]. http://identifiers.org/biolink:WIKIDATA_PROPERTY
Explore at:
Unique identifier
https://identifiers.org/biolink:WIKIDATA_PROPERTY
Dataset updated
Nov 16, 2021
License
https://bioregistry.io/spdx:CC0-1.0https://bioregistry.io/spdx:CC0-1.0
Description
Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.
t
Wikidata Explorer Feature - Dataset - LDM
service.tib.eu
resodate.org
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Wikidata Explorer Feature - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/wikidata-explorer-feature
Explore at:
Dataset updated
Jul 16, 2024
Description
With this feature the user is able to extend CSV datasets with existing information in the Wikidata KG. The tool applies entity linking to all concepts in the same column and enable the user to use the extracted entities to extend the dataset.
Wikidata dump 2017-12-27
zenodo.org
bz2
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WikiData; WikiData (2020). Wikidata dump 2017-12-27 [Dataset]. http://doi.org/10.5281/zenodo.1211767
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.1211767
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
WikiData; WikiData
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Wikidata dump retrieved from https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2 on 27 Dec 2017
Wikidata Politically Exposed Persons
opensanctions.org
Updated Nov 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wikidata (2025). Wikidata Politically Exposed Persons [Dataset]. https://www.opensanctions.org/datasets/wd_peps/
Explore at:
application/json+ftmAvailable download formats
Dataset updated
Nov 9, 2025
Dataset authored and provided by
Wikidata//wikidata.org/
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Profiles of politically exposed persons from Wikidata, the structured data version of Wikipedia.
Wikidata Causal Event Triple Data
zenodo.org
data.niaid.nih.gov
bin
Updated Feb 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sola; Sola; Debarun; Debarun; Oktie; Oktie (2023). Wikidata Causal Event Triple Data [Dataset]. http://doi.org/10.5281/zenodo.7196049
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7196049
Dataset updated
Feb 7, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sola; Sola; Debarun; Debarun; Oktie; Oktie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains triples curated from Wikidata surrounding news events with causal relations, and is released as part of our WWW'23 paper, "Event Prediction using Case-Based Reasoning over Knowledge Graphs".

Starting from a set of classes that we consider to be types of "events", we queried Wikidata to collect entities that were an instanceOf an event class and that were connected to another such event entity by a causal triple (https://www.wikidata.org/wiki/Wikidata:List_of_properties/causality). For all such cause-effect event pairs, we then collected a 3-hop neighborhood of outgoing triples.
wikidata-20240902-all.json.bz2
academictorrents.com
bittorrent
Updated Sep 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wikidata Contributors (2024). wikidata-20240902-all.json.bz2 [Dataset]. https://academictorrents.com/details/7bee8ece634c55ab4ed7da5a56dd81578729ed2b
Explore at:
bittorrent(91964359511)Available download formats
Dataset updated
Sep 5, 2024
Dataset provided by
Wikidata//wikidata.org/
Authors
Wikidata Contributors
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
A BitTorrent file to download data with the title 'wikidata-20240902-all.json.bz2'
h
freebase-wikidata-mapping
huggingface.co
Updated Mar 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Knowledge Discovery & Management Lab, DA-IICT (2024). freebase-wikidata-mapping [Dataset]. https://huggingface.co/datasets/kdm-daiict/freebase-wikidata-mapping
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 28, 2024
Dataset authored and provided by
Knowledge Discovery & Management Lab, DA-IICT
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
mapping between freebase and wikidata entities

This dataset maps freebase ids to wikidata ids and labels. It is useful for visualising and better understanding when working with datasets like fb15k-237 How it was created:

Download freebase-wikidata mapping from here. [compressed size: 21.2 MB] Download wikidata entities data from here. [compressed size: 81GB] Align labels with the freebase,wikidata id
Wikidata Companies Graph
zenodo.org
Updated Aug 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pantelis Chronis; Pantelis Chronis (2020). Wikidata Companies Graph [Dataset]. http://doi.org/10.5281/zenodo.3971752
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3971752
Dataset updated
Aug 5, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pantelis Chronis; Pantelis Chronis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains information about commercial organizations (companies) and their relations with other commercial organizations, persons, products, locations, groups and industries. The dataset has the form of a graph. It has been produced by the SmartDataLake project (https://smartdatalake.eu), using data collected from Wikidata (https://www.wikidata.org).
Wikidata Human Gender Indicators
figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Klein; Piotr Konieczny; Harsh Gupta; Vivek Rai; Haiyi Zhu (2023). Wikidata Human Gender Indicators [Dataset]. http://doi.org/10.6084/m9.figshare.3100903.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3100903.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Max Klein; Piotr Konieczny; Harsh Gupta; Vivek Rai; Haiyi Zhu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a collection of Gender Indicators from Wikidata and Wikipedia of Human Biographies. Data is derived from the 2016-01-03 Wikidata snapshot.Each file describe the humans in Wikidata aggregated by Gender (Property:P21), and dissaggregated by the following Wikidata Properties: - Date of Birth (P569)- Date of Death (P570)- Place of Birth (P19)- Country of Citizenship (P27)- Ethnic Group (P172)- Field of Work (P101)- Occupation (P106)- Wikipedia Language ("Sitelinks") Further aggregations of the data are: - World Map (Countries derived from place of birth and citizenship)- World Cultures (Inglehart Welzel Map applied to World Map)- Gender Co-Occurence (Humans with multiple genders).Wikidata labels have be translated to English for convenience when possible. You may still see values with "QIDs" which means there was no English translation possible. In the case where there were multiple values, such as for occupation, the we count the gender as co-occuring with each occupation separately.For more information. http://wigi.wmflabs.org/
t
Wikidata - Dataset - LDM
service.tib.eu
resodate.org
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Wikidata - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/wikidata
Explore at:
Dataset updated
Dec 3, 2024
Description
The dataset used in the paper is Wikidata, which contains a large number of entities and their corresponding semantic types.
Z
Wikidata Dump catholic
datasetcatalog.nlm.nih.gov
Updated Aug 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fünfstück, Benno (2021). Wikidata Dump catholic [Dataset]. http://doi.org/10.5281/zenodo.5212235
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5212235
Dataset updated
Aug 17, 2021
Authors
Fünfstück, Benno
Description
RDF dump of wikidata produced with wdumps. entities with affiliation roman catholic View on wdumper entity count: 0, statement count: 0, triple count: 0
Z
Selection of Wikidata works of art
data.niaid.nih.gov
Updated Nov 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pasqual, Valentina (2022). Selection of Wikidata works of art [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7307851
Explore at:
Dataset updated
Nov 9, 2022
Authors
Pasqual, Valentina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains a selection of works of art stored in Wikidata. It has been extracted via Wikidata API with action "wbgetentities"in date 01/05/2022.

The dataset comprises literary productions, visual artworks, cultural sites, operatic works, performing arts, journals etc. We semiautomatically surveyed the existing Wikidata subclasses of work of art and removed all elements belonging to music and cinematographic domains as well as statistical rumor.

In total, the dataset contains 2'156'363 works of art (wikidata entities) described by 18'924'333 statements and belonging to 6'612 selected classes. The dataset is stored in a zip folder organised in 10'782 json files, each storing 200 entities and their metadata.
WikiData - Datasets - OpenData.eol.org
opendata.eol.org
Updated Mar 22, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
eol.org (2017). WikiData - Datasets - OpenData.eol.org [Dataset]. https://opendata.eol.org/dataset/wikidata
Explore at:
Dataset updated
Mar 22, 2017
Dataset provided by
Encyclopedia of Lifehttp://eol.org/
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
For questions or use cases calling for large, multi-use aggregate data files, please visit the EOL Services forum at http://discuss.eol.org/c/eol-services read more
Dump of Wikidata of January 1st, 2024.
academictorrents.com
bittorrent
Updated Jan 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
wikidata.org (2024). Dump of Wikidata of January 1st, 2024. [Dataset]. https://academictorrents.com/details/0852ef544a4694995fcbef7132477c688ded7d9a
Explore at:
bittorrent(130532238059)Available download formats
Dataset updated
Jan 4, 2024
Dataset provided by
Wikidata//wikidata.org/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Wikidata: A Free and Open Knowledge Base Accessible: Readable and editable by humans and machines alike. Central Hub: Serves as the core storage for structured data across Wikimedia s sister projects, such as Wikipedia, Wikivoyage, Wiktionary, Wikisource, and more. Current Edition: This torrent represents an unofficial dump of Wikidata as of January 1st, 2024.
h
wikidata-enwiki-categories-and-statements
huggingface.co
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Erenrich (2025). wikidata-enwiki-categories-and-statements [Dataset]. https://huggingface.co/datasets/derenrich/wikidata-enwiki-categories-and-statements
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 26, 2025
Authors
Daniel Erenrich
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
derenrich/wikidata-enwiki-categories-and-statements dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Wikimedia Movement (2024). wikidata-all [Dataset]. https://huggingface.co/datasets/Wikimedians/wikidata-all

wikidata-all

Wikidata - All Entities

Wikimedians/wikidata-all

Explore at:

308 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 13, 2024

Dataset provided by

Wikimedia movementhttps://wikimedia.org/

Authors

Wikimedia Movement

License

https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

Description

Wikidata - All Entities

This Hugging Face Data Set contains the entirety of Wikidata as of the date listed below. Wikidata is a freely licensed structured knowledge graph following the wiki model of user contributions. If you build on this data please consider contributing back to Wikidata. For more on the size and other statistics of Wikidata, see: Special:Statistics. Current Dump as of: 2024-03-04

  Original Source

The data contained in this repository is retrieved… See the full description on the dataset page: https://huggingface.co/datasets/Wikimedians/wikidata-all.

Clear search

Close search

Google apps

Main menu

wikidata-all

wikidata-parallel-descriptions-en-ja

Wikidata Persons in Relevant Categories

Wikidata Entities of Interest

Wikidata PageRank

Wikidata Property

Wikidata Explorer Feature - Dataset - LDM

Wikidata dump 2017-12-27

Wikidata Politically Exposed Persons

Wikidata Causal Event Triple Data

wikidata-20240902-all.json.bz2

freebase-wikidata-mapping

Wikidata Companies Graph

Wikidata Human Gender Indicators

Wikidata - Dataset - LDM

Wikidata Dump catholic

Selection of Wikidata works of art

WikiData - Datasets - OpenData.eol.org

Dump of Wikidata of January 1st, 2024.

wikidata-enwiki-categories-and-statements

wikidata-all

Wikidata - All Entities

Wikimedians/wikidata-all