100+ datasets found
  1. h

    wikidata-extraction

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piet, wikidata-extraction [Dataset]. https://huggingface.co/datasets/piebro/wikidata-extraction
    Explore at:
    Authors
    Piet
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Wikidata Extraction

    This dataset contains all RDF triples extracted from the latest Wikidata, converted from the N-Triples format to Parquet. The data originates from Wikidata, a free and open knowledge base that acts as central storage for structured data used by Wikipedia and other Wikimedia projects. The source file is the "truthy" N-Triples dump (latest-truthy.nt.bz2), which contains only the current, non-deprecated statements. The code to extract this data is available at… See the full description on the dataset page: https://huggingface.co/datasets/piebro/wikidata-extraction.

  2. Freebase/Wikidata Mappings

    • kaggle.com
    zip
    Updated Mar 13, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dhruv Bansal (2026). Freebase/Wikidata Mappings [Dataset]. https://www.kaggle.com/datasets/dhruvb2028/freebasewikidata-mappings
    Explore at:
    zip(21894706 bytes)Available download formats
    Dataset updated
    Mar 13, 2026
    Authors
    Dhruv Bansal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides entity mappings between Freebase and Wikidata, enabling seamless integration between two large-scale knowledge graphs. It is based on the Wikidata data dump from October 28, 2013, and was originally published by Google under the CC0 (Public Domain) license.

    The mappings are carefully filtered to ensure high reliability:

    • Each mapping includes at least two shared Wikipedia links
    • There are no conflicting Wikipedia links

    This strict filtering results in high-confidence entity alignments, making the dataset useful for research and real-world applications in knowledge graph systems.

  3. wikidata-title-desc

    • huggingface.co
    Updated Mar 30, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikimedia (2026). wikidata-title-desc [Dataset]. https://huggingface.co/datasets/wikimedia/wikidata-title-desc
    Explore at:
    Dataset updated
    Mar 30, 2026
    Dataset provided by
    Wikimedia Foundationhttp://www.wikimedia.org/
    Authors
    Wikimedia
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Wikidata Title and Description

    Wikidata entity titles and short descriptions for all 324 Wikipedia language editions, extracted from the Wikimedia Analytics wmf.wikidata_entity table. Each row links a Wikidata QID to the title and description as they appear in a specific language edition of Wikipedia.

      Dataset Details
    

    Field Value

    Source wmf.wikidata_entity (Wikidata + Wikipedia sitelinks)

    Snapshot 2026-03-30

    Languages 324

    Total titles 88,292,409

    Titles… See the full description on the dataset page: https://huggingface.co/datasets/wikimedia/wikidata-title-desc.

  4. Wikidata Reference

    • figshare.com
    gz
    Updated Mar 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sven Hertling; Nandana Mihindukulasooriya (2025). Wikidata Reference [Dataset]. http://doi.org/10.6084/m9.figshare.28602170.v2
    Explore at:
    gzAvailable download formats
    Dataset updated
    Mar 17, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Sven Hertling; Nandana Mihindukulasooriya
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset SummaryThe Triple-to-Text Alignment dataset aligns Knowledge Graph (KG) triples from Wikidata with diverse, real-world textual sources extracted from the web. Unlike previous datasets that rely primarily on Wikipedia text, this dataset provides a broader range of writing styles, tones, and structures by leveraging Wikidata references from various sources such as news articles, government reports, and scientific literature. Large language models (LLMs) were used to extract and validate text spans corresponding to KG triples, ensuring high-quality alignments. The dataset can be used for training and evaluating relation extraction (RE) and knowledge graph construction systems.Data FieldsEach row in the dataset consists of the following fields:subject (str): The subject entity of the knowledge graph triple.rel (str): The relation that connects the subject and object.object (str): The object entity of the knowledge graph triple.text (str): A natural language sentence that entails the given triple.validation (str): LLM-based validation results, including:Fluent Sentence(s): TRUE/FALSESubject mentioned in Text: TRUE/FALSERelation mentioned in Text: TRUE/FALSEObject mentioned in Text: TRUE/FALSEFact Entailed By Text: TRUE/FALSEFinal Answer: TRUE/FALSEreference_url (str): URL of the web source from which the text was extracted.subj_qid (str): Wikidata QID for the subject entity.rel_id (str): Wikidata Property ID for the relation.obj_qid (str): Wikidata QID for the object entity.Dataset CreationThe dataset was created through the following process:1. Triple-Reference Sampling and ExtractionAll relations from Wikidata were extracted using SPARQL queries.A sample of KG triples with associated reference URLs was collected for each relation.2. Domain Analysis and Web ScrapingURLs were grouped by domain, and sampled pages were analyzed to determine their primary language.English-language web pages were scraped and processed to extract plaintext content.3. LLM-Based Text Span Selection and ValidationLLMs were used to identify text spans from web content that correspond to KG triples.A Chain-of-Thought (CoT) prompting method was applied to validate whether the extracted text entailed the triple.The validation process included checking for fluency, subject mention, relation mention, object mention, and final entailment.4. Final Dataset Statistics12.5K Wikidata relations were analyzed, leading to 3.3M triple-reference pairs.After filtering for English content, 458K triple-web content pairs were processed with LLMs.80.5K validated triple-text alignments were included in the final dataset.

  5. Wikidata dump from 2018-12-17 in JSON

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    gz
    Updated Jan 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jakub Klímek; Jakub Klímek; Petr Škoda; Petr Škoda (2021). Wikidata dump from 2018-12-17 in JSON [Dataset]. http://doi.org/10.5281/zenodo.4436356
    Explore at:
    gzAvailable download formats
    Dataset updated
    Jan 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jakub Klímek; Jakub Klímek; Petr Škoda; Petr Škoda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a dump from Wikidata from 2018-12-17 in JSON. This one is not avavailable anymore from Wikidata. It was downloaded originally from https://dumps.wikimedia.org/other/wikidata/20181217.json.gz and recompressed to fit on Zenodo.

  6. Wikidata Causal Event Triple Data

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Feb 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sola; Sola; Debarun; Debarun; Oktie; Oktie (2023). Wikidata Causal Event Triple Data [Dataset]. http://doi.org/10.5281/zenodo.7196049
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 7, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sola; Sola; Debarun; Debarun; Oktie; Oktie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains triples curated from Wikidata surrounding news events with causal relations, and is released as part of our WWW'23 paper, "Event Prediction using Case-Based Reasoning over Knowledge Graphs".

    Starting from a set of classes that we consider to be types of "events", we queried Wikidata to collect entities that were an instanceOf an event class and that were connected to another such event entity by a causal triple (https://www.wikidata.org/wiki/Wikidata:List_of_properties/causality). For all such cause-effect event pairs, we then collected a 3-hop neighborhood of outgoing triples.

  7. Z

    Wikidata Dump gkg

    • datasetcatalog.nlm.nih.gov
    Updated Oct 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fünfstück, Benno (2021). Wikidata Dump gkg [Dataset]. http://doi.org/10.5281/zenodo.5610072
    Explore at:
    Dataset updated
    Oct 28, 2021
    Authors
    Fünfstück, Benno
    Description

    RDF dump of wikidata produced with wdumps. Mappings to Google Knowledge Graph, both Freebase and new. View on wdumper entity count: 0, statement count: 0, triple count: 0

  8. Wikidata dump 2017-12-27

    • zenodo.org
    bz2
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WikiData; WikiData (2020). Wikidata dump 2017-12-27 [Dataset]. http://doi.org/10.5281/zenodo.1211767
    Explore at:
    bz2Available download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    WikiData; WikiData
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description
  9. Kensho Derived Wikimedia Dataset

    • kaggle.com
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kensho R&D (2020). Kensho Derived Wikimedia Dataset [Dataset]. https://www.kaggle.com/datasets/kenshoresearch/kensho-derived-wikimedia-data/discussion
    Explore at:
    zip(8760044227 bytes)Available download formats
    Dataset updated
    Jan 24, 2020
    Authors
    Kensho R&D
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Kensho Derived Wikimedia Dataset

    Wikipedia, the free encyclopedia, and Wikidata, the free knowledge base, are crowd-sourced projects supported by the Wikimedia Foundation. Wikipedia is nearly 20 years old and recently added its six millionth article in English. Wikidata, its younger machine-readable sister project, was created in 2012 but has been growing rapidly and currently contains more than 75 million items.

    These projects contribute to the Wikimedia Foundation's mission of empowering people to develop and disseminate educational content under a free license. They are also heavily utilized by computer science research groups, especially those interested in natural language processing (NLP). The Wikimedia Foundation periodically releases snapshots of the raw data backing these projects, but these are in a variety of formats and were not designed for use in NLP research. In the Kensho R&D group, we spend a lot of time downloading, parsing, and experimenting with this raw data. The Kensho Derived Wikimedia Dataset (KDWD) is a condensed subset of the raw Wikimedia data in a form that we find helpful for NLP work. The KDWD has a CC BY-SA 3.0 license, so feel free to use it in your work too.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4301984%2F972e4157b97efe8c2c5ea17c983b1504%2Fkdwd_header_logos_2.jpg?generation=1580510520532141&alt=media" alt="">

    This particular release consists of two main components - a link annotated corpus of English Wikipedia pages and a compact sample of the Wikidata knowledge base. We version the KDWD using the raw Wikimedia snapshot dates. The version string for this dataset is kdwd_enwiki_20191201_wikidata_20191202 indicating that this KDWD was built from the English Wikipedia snapshot from 2019 December 1 and the Wikidata snapshot from 2019 December 2. Below we describe these components in more detail.

    Example Notebooks

    Dive right in by checking out some of our example notebooks:

    Updates / Changelog

    • initial release 2020-01-31

    File Summary

    • Wikipedia
      • page.csv (page metadata and Wikipedia-to-Wikidata mapping)
      • link_annotated_text.jsonl (plaintext of Wikipedia pages with link offsets)
    • Wikidata
      • item.csv (item labels and descriptions in English)
      • item_aliases.csv (item aliases in English)
      • property.csv (property labels and descriptions in English)
      • property_aliases.csv (property aliases in English)
      • statements.csv (truthy qpq statements)

    Three Layers of Data

    The KDWD is three connected layers of data. The base layer is a plain text English Wikipedia corpus, the middle layer annotates the corpus by indicating which text spans are links, and the top layer connects the link text spans to items in Wikidata. Below we'll describe these layers in more detail.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4301984%2F19663d43bade0e92f578255f6e0d9dcd%2Fkensho_wiki_triple_layer.svg?generation=1580347573004185&alt=media" alt="">

    Wikipedia Sample

    The first part of the KDWD is derived from Wikipedia. In order to create a corpus of mostly natural text, we restrict our English Wikipedia page sample to those that:

  10. Wikidata Human Gender Indicators

    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Klein; Piotr Konieczny; Harsh Gupta; Vivek Rai; Haiyi Zhu (2023). Wikidata Human Gender Indicators [Dataset]. http://doi.org/10.6084/m9.figshare.3100903.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Max Klein; Piotr Konieczny; Harsh Gupta; Vivek Rai; Haiyi Zhu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a collection of Gender Indicators from Wikidata and Wikipedia of Human Biographies. Data is derived from the 2016-01-03 Wikidata snapshot.Each file describe the humans in Wikidata aggregated by Gender (Property:P21), and dissaggregated by the following Wikidata Properties: - Date of Birth (P569)- Date of Death (P570)- Place of Birth (P19)- Country of Citizenship (P27)- Ethnic Group (P172)- Field of Work (P101)- Occupation (P106)- Wikipedia Language ("Sitelinks") Further aggregations of the data are: - World Map (Countries derived from place of birth and citizenship)- World Cultures (Inglehart Welzel Map applied to World Map)- Gender Co-Occurence (Humans with multiple genders).Wikidata labels have be translated to English for convenience when possible. You may still see values with "QIDs" which means there was no English translation possible. In the case where there were multiple values, such as for occupation, the we count the gender as co-occuring with each occupation separately.For more information. http://wigi.wmflabs.org/

  11. h

    wikidata

    • huggingface.co
    Updated Feb 17, 2026
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The introspector project (2026). wikidata [Dataset]. https://huggingface.co/datasets/introspector/wikidata
    Explore at:
    Dataset updated
    Feb 17, 2026
    Dataset authored and provided by
    The introspector project
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    introspector/wikidata dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. Z

    wikidata-20180813-all.json.bz2

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikidata (2020). wikidata-20180813-all.json.bz2 [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3268724
    Explore at:
    Dataset updated
    Jan 24, 2020
    Authors
    Wikidata
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A copy of a dump which was available from WikiMedia: https://dumps.wikimedia.org/wikidatawiki/entities/

  13. E

    Wikidata

    • live.european-language-grid.eu
    json
    Updated Oct 28, 2012
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2012). Wikidata [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/7268
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Oct 28, 2012
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.

  14. Topics for each Wikipedia Article across Languages

    • figshare.com
    gz
    Updated Jun 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diego Saez-Trumper (2020). Topics for each Wikipedia Article across Languages [Dataset]. http://doi.org/10.6084/m9.figshare.12127434.v1
    Explore at:
    gzAvailable download formats
    Dataset updated
    Jun 29, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Diego Saez-Trumper
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the predicted topic(s) for (almost) each Wikipedia article across languages. Each row contains the following columns:Qid,topic,probability,page_id,page_title,wiki_db Where: * Qid: Wikidata Item Id* topic: Topic based on the ORES draft topic model (https://www.mediawiki.org/wiki/Talk:ORES/Draft_topic) * probability: Probability to belong to the topic* page_id: page_id* page_title: page_title* wiki_db: wiki_db, for example for english Wikipedia is enwikiFor exampleQ1000211,Geography.Regions.Europe.Western_Europe,1.0,166578,Frières-Faillouël,euwikiTopics are predicted using the Wikidata-Topic model developed by Isaac Johnson (https://github.com/geohci/wikidata-topic-model)The source code to create this dataset can be found here:https://github.com/digitalTranshumant/wikidata-topic-model

  15. Wikidata Dump dump_10-07-2021

    • zenodo.org
    bin, gz, json
    Updated Oct 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benno Fünfstück; Benno Fünfstück (2021). Wikidata Dump dump_10-07-2021 [Dataset]. http://doi.org/10.5281/zenodo.5554664
    Explore at:
    gz, json, binAvailable download formats
    Dataset updated
    Oct 9, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benno Fünfstück; Benno Fünfstück
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    RDF dump of wikidata produced with wdumps.

        <p>
        <br>
        <a href="https://tools.wmflabs.org/wdumps/dump/1752">View on wdumper</a>
        </p>
    
        <p>
        <b>entity count</b>: 0, <b>statement count</b>: 0, <b>triple count</b>: 0
        </p>
    
  16. AuditLP

    • kaggle.com
    zip
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sai Keerthana Karnam (2024). AuditLP [Dataset]. https://www.kaggle.com/datasets/saikeerthanakarnam/wikidata-geographic-datasets
    Explore at:
    zip(1258664611 bytes)Available download formats
    Dataset updated
    Jun 13, 2024
    Authors
    Sai Keerthana Karnam
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Wikidata is an open-source knowledge base that serves as a repository for structured data used in various Wikimedia projects like Wikipedia and Wikivoyage. Much like other knowledge graphs, Wikidata organizes information into triples, which consist of a subject item, a property, and an object. We extract a specific set of triples from Wikidata based on certain criteria and finally obtain the carefully curated dataset. Currently, we are presenting 5 geographic datasets - Argentina, Australia, India, South Africa and Russia.

  17. DO wikidata subse

    • figshare.com
    txt
    Updated Nov 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andra Waagmeester (2021). DO wikidata subse [Dataset]. http://doi.org/10.6084/m9.figshare.16990036.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 11, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Andra Waagmeester
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a subset from wikidata generate with a Shape Expression during the Biohackathon Europe 2021. It was extracted using ShEx.js

  18. h

    benchmarking-wikidata

    • huggingface.co
    Updated Jan 1, 2000
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marta Kipke (2000). benchmarking-wikidata [Dataset]. https://huggingface.co/datasets/MKipke/benchmarking-wikidata
    Explore at:
    Dataset updated
    Jan 1, 2000
    Authors
    Marta Kipke
    Description

    MKipke/benchmarking-wikidata dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. Wikidata Dump NA

    • zenodo.org
    bin, gz, json
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benno Fünfstück; Benno Fünfstück (2023). Wikidata Dump NA [Dataset]. http://doi.org/10.5281/zenodo.8025733
    Explore at:
    gz, json, binAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benno Fünfstück; Benno Fünfstück
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    RDF dump of wikidata produced with wdumper.


    View on wdumper

    entity count: 425468, statement count: 11624839, triple count: 25332332

  20. Wikidata dump

    • figshare.com
    bin
    Updated Apr 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Toporing index (2024). Wikidata dump [Dataset]. http://doi.org/10.6084/m9.figshare.25682832.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 24, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Toporing index
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset corresponds to a preprocessed dump of wikidata, where all identifiers were mapped to a contiguous alphabet

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Piet, wikidata-extraction [Dataset]. https://huggingface.co/datasets/piebro/wikidata-extraction

wikidata-extraction

piebro/wikidata-extraction

Explore at:
11 scholarly articles cite this dataset (View in Google Scholar)
Authors
Piet
License

https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

Description

Wikidata Extraction

This dataset contains all RDF triples extracted from the latest Wikidata, converted from the N-Triples format to Parquet. The data originates from Wikidata, a free and open knowledge base that acts as central storage for structured data used by Wikipedia and other Wikimedia projects. The source file is the "truthy" N-Triples dump (latest-truthy.nt.bz2), which contains only the current, non-deprecated statements. The code to extract this data is available at… See the full description on the dataset page: https://huggingface.co/datasets/piebro/wikidata-extraction.

Search
Clear search
Close search
Google apps
Main menu