100+ datasets found
  1. wikidata-all

    • huggingface.co
    Updated Mar 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikimedia Movement (2024). wikidata-all [Dataset]. https://huggingface.co/datasets/Wikimedians/wikidata-all
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 13, 2024
    Dataset provided by
    Wikimedia movementhttps://wikimedia.org/
    Authors
    Wikimedia Movement
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Wikidata - All Entities

    This Hugging Face Data Set contains the entirety of Wikidata as of the date listed below. Wikidata is a freely licensed structured knowledge graph following the wiki model of user contributions. If you build on this data please consider contributing back to Wikidata. For more on the size and other statistics of Wikidata, see: Special:Statistics. Current Dump as of: 2024-03-04

      Original Source
    

    The data contained in this repository is retrieved… See the full description on the dataset page: https://huggingface.co/datasets/Wikimedians/wikidata-all.

  2. h

    wikidata-parallel-descriptions-en-ja

    • huggingface.co
    Updated May 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    elanmitsua (2024). wikidata-parallel-descriptions-en-ja [Dataset]. https://huggingface.co/datasets/Mitsua/wikidata-parallel-descriptions-en-ja
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 20, 2024
    Authors
    elanmitsua
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Wikidata parallel descriptions en-ja

    Parallel corpus for machine translation generated from wikidata dump (2024-05-06). Currently we processed only English/Japanese pair. The jsonl file is ready-to-train by Hugging Face transformers trainer for translation tasks.

      Dataset Details
    

    https://www.wikidata.org/wiki/Wikidata:Database_download

      Dataset Creation
    

    As Wikidata description field does not represent exact direct translation, filtering is required for… See the full description on the dataset page: https://huggingface.co/datasets/Mitsua/wikidata-parallel-descriptions-en-ja.

  3. Wikidata Entities of Interest

    • opensanctions.org
    csv
    Updated Dec 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikidata (2024). Wikidata Entities of Interest [Dataset]. https://www.opensanctions.org/datasets/wd_curated/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 6, 2024
    Dataset authored and provided by
    Wikidata//wikidata.org/
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Persons of interest profiles from Wikidata, the structured data version of Wikipedia.

  4. h

    wikidata-en-descriptions-small

    • huggingface.co
    Updated Aug 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Erenrich (2023). wikidata-en-descriptions-small [Dataset]. https://huggingface.co/datasets/derenrich/wikidata-en-descriptions-small
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 5, 2023
    Authors
    Daniel Erenrich
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    derenrich/wikidata-en-descriptions-small dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. a

    Wikidata PageRank

    • danker.s3.amazonaws.com
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas Thalhammer (2025). Wikidata PageRank [Dataset]. https://danker.s3.amazonaws.com/index.html
    Explore at:
    tsv, application/n-triples, application/vnd.hdt, ttlAvailable download formats
    Dataset updated
    Nov 13, 2025
    Authors
    Andreas Thalhammer
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Regularly published dataset of PageRank scores for Wikidata entities. The underlying link graph is formed by a union of all links accross all Wikipedia language editions. Computation is performed Andreas Thalhammer with 'danker' available at https://github.com/athalhammer/danker . If you find the downloads here useful please feel free to leave a GitHub ⭐ at the repository and buy me a ☕ https://www.buymeacoffee.com/thalhamm

  6. Wikidata dump 2017-12-27

    • zenodo.org
    bz2
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WikiData; WikiData (2020). Wikidata dump 2017-12-27 [Dataset]. http://doi.org/10.5281/zenodo.1211767
    Explore at:
    bz2Available download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    WikiData; WikiData
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description
  7. Z

    Wikidata dump from 2018-12-17 in JSON

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Klímek, Jakub; Škoda, Petr (2021). Wikidata dump from 2018-12-17 in JSON [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4436355
    Explore at:
    Dataset updated
    Jan 15, 2021
    Dataset provided by
    Charles University
    Authors
    Klímek, Jakub; Škoda, Petr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a dump from Wikidata from 2018-12-17 in JSON. This one is not avavailable anymore from Wikidata. It was downloaded originally from https://dumps.wikimedia.org/other/wikidata/20181217.json.gz and recompressed to fit on Zenodo.

  8. wikidata-20240902-all.json.bz2

    • academictorrents.com
    bittorrent
    Updated Sep 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikidata Contributors (2024). wikidata-20240902-all.json.bz2 [Dataset]. https://academictorrents.com/details/7bee8ece634c55ab4ed7da5a56dd81578729ed2b
    Explore at:
    bittorrent(91964359511)Available download formats
    Dataset updated
    Sep 5, 2024
    Dataset provided by
    Wikidata//wikidata.org/
    Authors
    Wikidata Contributors
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    A BitTorrent file to download data with the title 'wikidata-20240902-all.json.bz2'

  9. Wikidata Persons in Relevant Categories

    • opensanctions.org
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikidata (2025). Wikidata Persons in Relevant Categories [Dataset]. https://www.opensanctions.org/datasets/wd_categories/
    Explore at:
    Dataset updated
    Nov 13, 2025
    Dataset authored and provided by
    Wikidata//wikidata.org/
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Category-based imports from Wikidata, the structured data version of Wikipedia.

  10. Z

    Wikidata Causal Event Triple Data

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sola; Debarun; Oktie (2023). Wikidata Causal Event Triple Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7196048
    Explore at:
    Dataset updated
    Feb 7, 2023
    Dataset provided by
    Shirai
    Bhattacharjya
    Hassanzadeh
    Authors
    Sola; Debarun; Oktie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains triples curated from Wikidata surrounding news events with causal relations, and is released as part of our WWW'23 paper, "Event Prediction using Case-Based Reasoning over Knowledge Graphs".

    Starting from a set of classes that we consider to be types of "events", we queried Wikidata to collect entities that were an instanceOf an event class and that were connected to another such event entity by a causal triple (https://www.wikidata.org/wiki/Wikidata:List_of_properties/causality). For all such cause-effect event pairs, we then collected a 3-hop neighborhood of outgoing triples.

  11. Wikidata item quality labels

    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Glorian Yapinus; Amir Sarabadani; Aaron Halfaker (2023). Wikidata item quality labels [Dataset]. http://doi.org/10.6084/m9.figshare.5035796.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Glorian Yapinus; Amir Sarabadani; Aaron Halfaker
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains quality labels for 5000 Wikidata items applied by Wikidata editors. The labels correspond to the quality scale described at https://www.wikidata.org/wiki/Wikidata:Item_quality Each line is a JSON blob with the following fields: - item_quality: The labeled quality class (A-E)- rev_id: the revision identifier of the version of the item that was labeled- strata: The size of the item in bytes at the time it was sampled- page_len: The actual size of the item in bytes- page_title: The Qid of the item- claims: A dictionary including P31 "instance-of" values for filtering out certain types of itemsThe # of observations by class is: - A class: 322- B class: 438- C class: 1773- D class: 997- E class: 1470

  12. Wikidata Companies Graph

    • zenodo.org
    • data.hellenicdataservice.gr
    application/gzip
    Updated Aug 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pantelis Chronis; Pantelis Chronis (2020). Wikidata Companies Graph [Dataset]. http://doi.org/10.5281/zenodo.3971752
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Aug 5, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pantelis Chronis; Pantelis Chronis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains information about commercial organizations (companies) and their relations with other commercial organizations, persons, products, locations, groups and industries. The dataset has the form of a graph. It has been produced by the SmartDataLake project (https://smartdatalake.eu), using data collected from Wikidata (https://www.wikidata.org).

  13. h

    freebase-wikidata-mapping

    • huggingface.co
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Knowledge Discovery & Management Lab, DA-IICT (2024). freebase-wikidata-mapping [Dataset]. https://huggingface.co/datasets/kdm-daiict/freebase-wikidata-mapping
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 28, 2024
    Dataset authored and provided by
    Knowledge Discovery & Management Lab, DA-IICT
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    mapping between freebase and wikidata entities

    This dataset maps freebase ids to wikidata ids and labels. It is useful for visualising and better understanding when working with datasets like fb15k-237 How it was created:

    Download freebase-wikidata mapping from here. [compressed size: 21.2 MB] Download wikidata entities data from here. [compressed size: 81GB] Align labels with the freebase,wikidata id

  14. Wikidata Politically Exposed Persons

    • opensanctions.org
    Updated Nov 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikidata (2025). Wikidata Politically Exposed Persons [Dataset]. https://www.opensanctions.org/datasets/wd_peps/
    Explore at:
    application/json+ftmAvailable download formats
    Dataset updated
    Nov 9, 2025
    Dataset authored and provided by
    Wikidata//wikidata.org/
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Profiles of politically exposed persons from Wikidata, the structured data version of Wikipedia.

  15. WikiData - Datasets - OpenData.eol.org

    • opendata.eol.org
    Updated Mar 22, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    eol.org (2017). WikiData - Datasets - OpenData.eol.org [Dataset]. https://opendata.eol.org/dataset/wikidata
    Explore at:
    Dataset updated
    Mar 22, 2017
    Dataset provided by
    Encyclopedia of Lifehttp://eol.org/
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    For questions or use cases calling for large, multi-use aggregate data files, please visit the EOL Services forum at http://discuss.eol.org/c/eol-services read more

  16. t

    Wikidata - Dataset - LDM

    • service.tib.eu
    • resodate.org
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Wikidata - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/wikidata
    Explore at:
    Dataset updated
    Dec 3, 2024
    Description

    The dataset used in the paper is Wikidata, which contains a large number of entities and their corresponding semantic types.

  17. Z

    wikidata-20180813-all.json.bz2

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikidata (2020). wikidata-20180813-all.json.bz2 [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3268724
    Explore at:
    Dataset updated
    Jan 24, 2020
    Authors
    Wikidata
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A copy of a dump which was available from WikiMedia: https://dumps.wikimedia.org/wikidatawiki/entities/

  18. b

    Wikidata Property

    • bioregistry.io
    • semantic.farm
    Updated Nov 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Wikidata Property [Dataset]. http://identifiers.org/biolink:WIKIDATA_PROPERTY
    Explore at:
    Dataset updated
    Nov 16, 2021
    License

    https://bioregistry.io/spdx:CC0-1.0https://bioregistry.io/spdx:CC0-1.0

    Description

    Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.

  19. h

    wikidata-enwiki-categories-and-statements

    • huggingface.co
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Erenrich (2025). wikidata-enwiki-categories-and-statements [Dataset]. https://huggingface.co/datasets/derenrich/wikidata-enwiki-categories-and-statements
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 26, 2025
    Authors
    Daniel Erenrich
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    derenrich/wikidata-enwiki-categories-and-statements dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. Dump of Wikidata of January 1st, 2024.

    • academictorrents.com
    bittorrent
    Updated Jan 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wikidata.org (2024). Dump of Wikidata of January 1st, 2024. [Dataset]. https://academictorrents.com/details/0852ef544a4694995fcbef7132477c688ded7d9a
    Explore at:
    bittorrent(130532238059)Available download formats
    Dataset updated
    Jan 4, 2024
    Dataset provided by
    Wikidata//wikidata.org/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Wikidata: A Free and Open Knowledge Base Accessible: Readable and editable by humans and machines alike. Central Hub: Serves as the core storage for structured data across Wikimedia s sister projects, such as Wikipedia, Wikivoyage, Wiktionary, Wikisource, and more. Current Edition: This torrent represents an unofficial dump of Wikidata as of January 1st, 2024.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Wikimedia Movement (2024). wikidata-all [Dataset]. https://huggingface.co/datasets/Wikimedians/wikidata-all
Organization logo

wikidata-all

Wikidata - All Entities

Wikimedians/wikidata-all

Explore at:
298 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 13, 2024
Dataset provided by
Wikimedia movementhttps://wikimedia.org/
Authors
Wikimedia Movement
License

https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

Description

Wikidata - All Entities

This Hugging Face Data Set contains the entirety of Wikidata as of the date listed below. Wikidata is a freely licensed structured knowledge graph following the wiki model of user contributions. If you build on this data please consider contributing back to Wikidata. For more on the size and other statistics of Wikidata, see: Special:Statistics. Current Dump as of: 2024-03-04

  Original Source

The data contained in this repository is retrieved… See the full description on the dataset page: https://huggingface.co/datasets/Wikimedians/wikidata-all.

Search
Clear search
Close search
Google apps
Main menu