Saved datasets
Last updated
Download format
Croissant
Croissant is a format for Machine Learning datasets
Learn more about this at mlcommons.org/croissant.
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Provider
Free
Cost to access
Described as free to access or have a license that allows redistribution.
100+ datasets found
  1. h

    wikidata

    • huggingface.co
    Updated Apr 3, 2025
  2. P

    Wikidata Dataset

    • paperswithcode.com
    Updated Dec 31, 2023
  3. P

    Wikidata-Disamb Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Feb 5, 2021
  4. wikidata-20220103-all.json.gz

    • academictorrents.com
    bittorrent
    Updated Jan 24, 2022
  5. h

    wikidata-parallel-descriptions-en-ja

    • huggingface.co
    Updated May 20, 2024
  6. P

    Wikidata5M Dataset

    • paperswithcode.com
    Updated Nov 15, 2023
  7. Wikidata Entities of Interest

    • opensanctions.org
    csv
    Updated Dec 6, 2024
  8. h

    Wikidata

    • huggingface.co
    Updated May 23, 2025
  9. a

    Wikidata PageRank

    • danker.s3.amazonaws.com
    Updated Jun 14, 2025
  10. b

    Wikidata

    • bioregistry.io
    Updated Nov 13, 2021
  11. P

    Wikidata-14M Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jul 12, 2021
  12. f

    Wikidata Reference

    • figshare.com
    application/gzip
    Updated Mar 17, 2025
    + more versions
  13. Wikidata Politically Exposed Persons

    • opensanctions.org
    Updated Jul 9, 2025
  14. h

    Wikidata Companies Graph

    • data.hellenicdataservice.gr
    Updated Jun 20, 2019
  15. wikidata-20240701-all.json.bz2

    • academictorrents.com
    bittorrent
    Updated Aug 30, 2024
  16. Wikidata Dump simple english

    • zenodo.org
    application/gzip, bin +1
    Updated Jun 15, 2023
    + more versions
  17. h

    wikidata-en-descriptions-small

    • huggingface.co
    Updated Aug 5, 2023
    + more versions
  18. f

    Wikidata Constraints Violations - July 2018

    • figshare.com
    txt
    Updated Feb 14, 2019
  19. Wikidata item quality labels

    • figshare.com
    txt
    Updated May 31, 2023
  20. Wikidata Subsetting: Performance and Accuracy Experiment Datasets

    • zenodo.org
    • portalinvestigacion.uniovi.es
    • +1more
    application/gzip, bin +3
    Updated Jun 9, 2023
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Philippe Saadé (2025). wikidata [Dataset]. https://huggingface.co/datasets/philippesaade/wikidata

wikidata

philippesaade/wikidata

Wikidata Entities Connected to Wikipedia

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 3, 2025
Authors
Philippe Saadé
License

https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

Description

Wikidata Entities Connected to Wikipedia

This dataset is a multilingual, JSON-formatted version of the Wikidata dump from September 18, 2024. It only includes Wikidata entities that are connected to a Wikipedia page in any language. A total of 112,467,802 entities are included in the original data dump, of which 30,072,707 are linked to a Wikipedia page (26.73% of all entities have at least one Wikipedia sitelink).

Curated by: Jonathan Fraine & Philippe Saadé, Wikimedia Deutschland… See the full description on the dataset page: https://huggingface.co/datasets/philippesaade/wikidata.

Search
Clear search
Close search
Google apps
Main menu