100+ datasets found
  1. b

    Wikidata

    • bioregistry.io
    Updated Nov 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Wikidata [Dataset]. http://identifiers.org/biolink:WIKIDATA
    Explore at:
    Dataset updated
    Nov 13, 2021
    Description

    Wikidata is a collaboratively edited knowledge base operated by the Wikimedia Foundation. It is intended to provide a common source of certain types of data which can be used by Wikimedia projects such as Wikipedia. Wikidata functions as a document-oriented database, centred on individual items. Items represent topics, for which basic information is stored that identifies each topic.

  2. P

    Wikidata-Disamb Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Feb 10, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Cetoli; Mohammad Akbari; Stefano Bragaglia; Andrew D. O'Harney; Marc Sloan (2021). Wikidata-Disamb Dataset [Dataset]. https://paperswithcode.com/dataset/wikidata-disamb
    Explore at:
    Dataset updated
    Feb 10, 2021
    Authors
    Alberto Cetoli; Mohammad Akbari; Stefano Bragaglia; Andrew D. O'Harney; Marc Sloan
    Description

    The Wikidata-Disamb dataset is intended to allow a clean and scalable evaluation of NED with Wikidata entries, and to be used as a reference in future research.

  3. f

    Wikidata Constraints Violations - July 2018

    • figshare.com
    txt
    Updated Feb 14, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Pellissier Tanon; Camille Bourgaux (2019). Wikidata Constraints Violations - July 2018 [Dataset]. http://doi.org/10.6084/m9.figshare.7712720.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 14, 2019
    Dataset provided by
    figshare
    Authors
    Thomas Pellissier Tanon; Camille Bourgaux
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains corrections for Wikidata constraint violations extracted from the July 1st 2018 Wikidata full history dump.The following constraints are considered:* conflicts with: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Conflicts_with* distinct values: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Unique_value* inverse and symmetric: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Inverse https://www.wikidata.org/wiki/Help:Property_constraints_portal/Symmetric* item requires statement: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Item* one of: https://www.wikidata.org/wiki/Help:Property_constraints_portal/One_of* single value: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Single_value* type: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Type* value requires statement: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Target_required_claim* value type: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Value_typeThe constraints.tsv file contains the list of most of the Wikidata constraints considered in this dataset (beware, there could be some discrepancies for type, valueType, itemRequiresClaim and valueRequiresClaim constraints).It is a tabbed-separated file with the following columns:* constrain id: the URI of the Wikidata statement describing the constraint* property id: the URI of the property that is constrained* type id: the URI of the constraint type (type, value type...). It is a Wikidata item.* 15 columns for the possible attributes of the constraint. If an attribute has multiple values, they are in the same cell but separated by a space. The columns are:** regex: https://www.wikidata.org/wiki/Property:P1793** exceptions: https://www.wikidata.org/wiki/Property:P2303** group by: https://www.wikidata.org/wiki/Property:P2304** items: https://www.wikidata.org/wiki/Property:P2305** property: https://www.wikidata.org/wiki/Property:P2306** namespace: https://www.wikidata.org/wiki/Property:P2307** class: https://www.wikidata.org/wiki/Property:P2308** relation: https://www.wikidata.org/wiki/Property:P2309** minimal date: https://www.wikidata.org/wiki/Property:P2310** maximum date: https://www.wikidata.org/wiki/Property:P2311** maximum value: https://www.wikidata.org/wiki/Property:P2312** minimal value: https://www.wikidata.org/wiki/Property:P2313** status: https://www.wikidata.org/wiki/Property:P2316** separator: https://www.wikidata.org/wiki/Property:P4155** scope: https://www.wikidata.org/wiki/Property:P5314The other files provide for each constraint type the list of all corrections extracted from the edit history. The format of the file is one line per correction with the following tabbed-separated values:* URI for the statement describing the constraint in Wikidata* URI of the revision that has solved the constraint violation* subject, predicate and object of the triple that was violating the constraint (separated by a tab)* the string "->"* subject, predicate and object of the triple(s) of the correction, each followed by "http://wikiba.se/history/ontology#deletion" if the triple has been removed or "http://wikiba.se/history/ontology#addition" if the triple has been added. Each component of these values is separated by a tab.More detailed explanations are provided in a soon to be published paper

  4. Wikidata Companies Graph

    • zenodo.org
    • data.hellenicdataservice.gr
    application/gzip
    Updated Aug 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pantelis Chronis; Pantelis Chronis (2020). Wikidata Companies Graph [Dataset]. http://doi.org/10.5281/zenodo.3971752
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Aug 5, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pantelis Chronis; Pantelis Chronis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains information about commercial organizations (companies) and their relations with other commercial organizations, persons, products, locations, groups and industries. The dataset has the form of a graph. It has been produced by the SmartDataLake project (https://smartdatalake.eu), using data collected from Wikidata (https://www.wikidata.org).

  5. Wikidata Politically Exposed Persons

    • opensanctions.org
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikidata (2024). Wikidata Politically Exposed Persons [Dataset]. https://www.opensanctions.org/datasets/wd_peps/
    Explore at:
    application/json+ftm, json, application/json+senzing, csv, txtAvailable download formats
    Dataset updated
    Apr 18, 2024
    Dataset authored and provided by
    Wikidata//wikidata.org/
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Profiles of politically exposed persons from Wikidata, the structured data version of Wikipedia.

  6. A dataset of scholarly journals in wikidata : (selected) external...

    • zenodo.org
    zip
    Updated Nov 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexis-Michel Mugabushaka; Alexis-Michel Mugabushaka (2022). A dataset of scholarly journals in wikidata : (selected) external identifiers [Dataset]. http://doi.org/10.5281/zenodo.6347127
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 22, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexis-Michel Mugabushaka; Alexis-Michel Mugabushaka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For an updated list , see

    Matching OpenAlex venues to Wikidata identifiers

    Motivation : the selective/Inclusive approach in bibliometric databases

    An important difference between bibliometric databases is their “inclusion policy”.

    Some databases like Web Of Science and Scopus select the sources they index, while others like Dimensions and OpenAlex are more inclusive (they index for example all data from a given source such as Crossref).

    WOS

    selectivity remained a hallmark of coverage because Garfield had decided early on to focus on internationally influential journals.” (...).”

    SCOPUS

    Serial content (i.e., journals, conference proceedings, and book series) submitted for possible inclusion in Scopus by editors and publishers is reviewed and selected, based on criteria of scientific quality and rigor. This selection process is carried out by an external Content Selection and Advisory Board (CSAB) of editorially independent scientists, each of which are subject matter experts in their respective fields. This ensures that only high-quality curated content is indexed in the database and affirms the trustworthiness of Scopus

    Dimensions

    We have decided to take an “inclusive” approach to the publications we index in Dimensions. We believe that Dimensions should be a comprehensive data source, not a judgment call, and so we index as broad a swath of content as possible and have developed a number of features (e.g., the Dimensions API, journal list filters that limit search results to journals that appear in sources such as Pubmed or the 2015 Australian ERA6 journal list) that allow users to filter and select the data that is most relevant to their specific needs.



    Using wikidata to enable the filtering of “ venues subsets” in OpenAlex

    We are interested in creating subsets of venues in OpenAlex (for example for comparative analysis with inclusive databases or other use cases). This would require matching identifiers of OpenAlex venues to other identifiers.

    Thanks to WikiCite, a project to record and link scholarly data, Wikidata has a large collection of metadata related to Scholarly journals. This repository provides a subset of the scholarly journals in Wikidata, focusing mainly on external identifiers.

    The dataset will be used to explore the extent to which wikidata journal external identifiers can be used to select the content in OpenAlex.

    (see here an list of openly available lists of journals )

    Dataset creation & Documentation

    Some numbers :

    Number of journals in wikidata : 113,797 ; With issn_l 95,888 , With OpenAlex_venue id : 29,150

    external identifiers

    https://www.wikidata.org/wiki/Property:P236 # ext_id_issn

    https://www.wikidata.org/wiki/Property:P7363 # ext_id_issn_l

    https://www.wikidata.org/wiki/Property:P8375 # ext_id_crossref_journal_id

    https://www.wikidata.org/wiki/Property:P1055 # ext_id_nlm_unique_id

    https://www.wikidata.org/wiki/Property:P1058 # ext_id_era_journal_id

    https://www.wikidata.org/wiki/Property:P1250 # ext_id_danish_bif_id

    https://www.wikidata.org/wiki/Property:P10283 #ext_id_openalex_id

    https://www.wikidata.org/wiki/Property:P1156 # ext_id_scopus_source_id


    Indexing services

    https://www.wikidata.org/wiki/Property:P8875

    https://www.wikidata.org/wiki/Q371467 # Scopus

    https://www.wikidata.org/wiki/Q104047209 # Science Citation Index Expanded

    https://www.wikidata.org/wiki/Q22908122 # Emerging Sources Citation Index

    https://www.wikidata.org/wiki/Q1090953 # Social Sciences Citation Index

    https://www.wikidata.org/wiki/Q713927 # Arts and Humanities Citation index

  7. P

    Wikidata-14M Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jul 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kholoud Alghamdi; Miaojing Shi; Elena Simperl (2021). Wikidata-14M Dataset [Dataset]. https://paperswithcode.com/dataset/wikidata-14m
    Explore at:
    Dataset updated
    Jul 12, 2021
    Authors
    Kholoud Alghamdi; Miaojing Shi; Elena Simperl
    Description

    Wikidata-14M is a recommender system dataset for recommending items to Wikidata editors. It consists of 220,000 editors responsible for 14 million interactions with 4 million items.

  8. P

    Wikidata Dataset

    • paperswithcode.com
    Updated Dec 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Wikidata Dataset [Dataset]. https://paperswithcode.com/dataset/wikidata
    Explore at:
    Dataset updated
    Dec 31, 2023
    Description

    Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. It acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.

  9. Wikidata subset with revision history information [RDF]

    • zenodo.org
    bin
    Updated Jun 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alejandro Gonzalez-Hevia; Alejandro Gonzalez-Hevia (2022). Wikidata subset with revision history information [RDF] [Dataset]. http://doi.org/10.5281/zenodo.6613875
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 6, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alejandro Gonzalez-Hevia; Alejandro Gonzalez-Hevia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is composed of 300 instances from the 100 most important classes in Wikidata, for a total of around 30000 entities and 390000 triples. The dataset is geared towards knowledge graph refinement models that leverage edit history information from the graph. There are two versions of the dataset:

    • The static version (files postfixed with '_static') contains the simple statements of each entity fetched from Wikidata.
    • The dynamic version (files postfixed with '_dynamic') contains information about the operations and revisions made to these entities, and the triples that were added or removed.

    Each version is split into three subsets: train, validation (val), and test. Each split contains every entity from the dataset. The train split contains the first 70% of revisions made to each entity, the validation split contains the 70% to 85% revisions, and the test set contains the last 15% revisions.

    This is a sample from the static datasets:

    wd:Q217432 a uo:entity ;
      wdt:P1082 1.005904e+06 ;
      wdt:P1296 "0052280" ;
      wdt:P1791 wd:Q18704103 ;
      wdt:P18 "Pitakwa.jpg" ;
      wdt:P244 "n80066826" ;
      wdt:P571 "+1912-00-00T00:00:00Z" ;
      wdt:P6766 "421180027" .

    Each entity has the type uo:entity, and contains the statements added during that time period following Wikidata's data model.

    In the following code snippet we show an example from the dynamic dataset:

    uo:rev703872813 a uo:revision ;
      uo:timestamp "2018-06-28T22:31:32Z" .
    
    uo:op703872813_0 a uo:operation ;
      uo:fromRevision uo:rev703872813 ;
      uo:newObject wd:Q82955 ;
      uo:opType uo:add ;
      uo:revProp wdt:P106 ;
      uo:revSubject wd:Q6097419 .
    
    uo:op703878666_0 a uo:operation ;
      uo:fromRevision uo:rev703878666 ;
      uo:opType uo:remove ;
      uo:prevObject wd:Q1108445 ;
      uo:revProp wdt:P460 ;
      uo:revSubject wd:Q1147883 .

    This dataset is composed of revisions, which have a timestamp. Each revision is composed of 1 to n operations, in which there is a change to a statement from the entity. There are two types of operations: uo:add and uo:remove. In both cases, the property and the subject being modified are shown with the uo:revProp and uo:revSubject properties. In the case of additions, uo:newObject and uo:prevObject properties are added to show the previous and new objects after the addition. In the case of removals, there is a uo:prevObject property to record the object that was removed.

  10. Wikidata Dump NA

    • zenodo.org
    application/gzip, bin +1
    Updated Jun 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benno Fünfstück; Benno Fünfstück (2023). Wikidata Dump NA [Dataset]. http://doi.org/10.5281/zenodo.8025733
    Explore at:
    json, application/gzip, binAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benno Fünfstück; Benno Fünfstück
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    RDF dump of wikidata produced with wdumper.


    View on wdumper

    entity count: 425468, statement count: 11624839, triple count: 25332332

  11. Wikidata subset with revision history information [JSON]

    • zenodo.org
    • explore.openaire.eu
    bin
    Updated Jun 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alejandro Gonzalez-Hevia; Alejandro Gonzalez-Hevia (2022). Wikidata subset with revision history information [JSON] [Dataset]. http://doi.org/10.5281/zenodo.6614264
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 6, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alejandro Gonzalez-Hevia; Alejandro Gonzalez-Hevia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists the complete revision history of every instance of the 100 most important classes in Wikidata. It contains 9.3 million classes and around 450 million revisions made to those classes. This dataset was exported from a MongoDB database. After decompressing the files, the resulting JSON files can be imported into MongoDB using the following commands:

    mongoimport --db=db_name --collection=wd_entities --file=wd_entities.json
    mongoimport --db=db_name --collection=wd_revisions --file=wd_revisions.json
    

    Make sure that db_name is replaced by the database where this data will be imported.

    Documents within the wd_entities collection have the following schema:

    • id: Internal id of the entity used by Wikidata (e.g. 8195238).
    • entity_id: Public id of the entity in Wikidata (e.g. 'Q42')
    • class_ids: List of classes that the entity belongs to (e.g. ['Q5', 'Q100'])
    • entity_json: JSON contents of the entity, following Wikidata's JSON data model (https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_json.html).

    Documents within the wd_revisions collection have the following schema:

    • id: Identifier of the revision (e.g. 15921539)
    • entity_id: Public id of the entity in Wikidata affected by this revision (e.g. 'Q42')
    • class_ids: List of classes that the entity affected by this revision belongs to (e.g. ['Q5', 'Q100'])
    • parent_id: Identifier of the previous revision to this one, if it exists (e.g. 15921214)
    • timestamp: Date where the revision was made, following the ISO 8601 format (e.g. +2019-05-27T09:31:10Z)
    • username: Username of the user that made the revision.
    • comment: Comments made by the user in the revision, if any.
    • entity_diff: List of operations made in this revision, following the JSON Patch format.
  12. Kensho Derived Wikimedia Dataset

    • kaggle.com
    Updated Jan 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kensho R&D (2020). Kensho Derived Wikimedia Dataset [Dataset]. https://www.kaggle.com/kenshoresearch/kensho-derived-wikimedia-data/activity
    Explore at:
    Dataset updated
    Jan 31, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kensho R&D
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Kensho Derived Wikimedia Dataset

    Wikipedia, the free encyclopedia, and Wikidata, the free knowledge base, are crowd-sourced projects supported by the Wikimedia Foundation. Wikipedia is nearly 20 years old and recently added its six millionth article in English. Wikidata, its younger machine-readable sister project, was created in 2012 but has been growing rapidly and currently contains more than 75 million items.

    These projects contribute to the Wikimedia Foundation's mission of empowering people to develop and disseminate educational content under a free license. They are also heavily utilized by computer science research groups, especially those interested in natural language processing (NLP). The Wikimedia Foundation periodically releases snapshots of the raw data backing these projects, but these are in a variety of formats and were not designed for use in NLP research. In the Kensho R&D group, we spend a lot of time downloading, parsing, and experimenting with this raw data. The Kensho Derived Wikimedia Dataset (KDWD) is a condensed subset of the raw Wikimedia data in a form that we find helpful for NLP work. The KDWD has a CC BY-SA 3.0 license, so feel free to use it in your work too.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4301984%2F972e4157b97efe8c2c5ea17c983b1504%2Fkdwd_header_logos_2.jpg?generation=1580510520532141&alt=media" alt="">

    This particular release consists of two main components - a link annotated corpus of English Wikipedia pages and a compact sample of the Wikidata knowledge base. We version the KDWD using the raw Wikimedia snapshot dates. The version string for this dataset is kdwd_enwiki_20191201_wikidata_20191202 indicating that this KDWD was built from the English Wikipedia snapshot from 2019 December 1 and the Wikidata snapshot from 2019 December 2. Below we describe these components in more detail.

    Example Notebooks

    Dive right in by checking out some of our example notebooks:

    Updates / Changelog

    • initial release 2020-01-31

    File Summary

    • Wikipedia
      • page.csv (page metadata and Wikipedia-to-Wikidata mapping)
      • link_annotated_text.jsonl (plaintext of Wikipedia pages with link offsets)
    • Wikidata
      • item.csv (item labels and descriptions in English)
      • item_aliases.csv (item aliases in English)
      • property.csv (property labels and descriptions in English)
      • property_aliases.csv (property aliases in English)
      • statements.csv (truthy qpq statements)

    Three Layers of Data

    The KDWD is three connected layers of data. The base layer is a plain text English Wikipedia corpus, the middle layer annotates the corpus by indicating which text spans are links, and the top layer connects the link text spans to items in Wikidata. Below we'll describe these layers in more detail.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4301984%2F19663d43bade0e92f578255f6e0d9dcd%2Fkensho_wiki_triple_layer.svg?generation=1580347573004185&alt=media" alt="">

    Wikipedia Sample

    The first part of the KDWD is derived from Wikipedia. In order to create a corpus of mostly natural text, we restrict our English Wikipedia page sample to those that:

    From these pages we construct a corpus of link annotated text. We store this data in a single JSON Lines file with one page per line. Each page object has the following format:

    page = {
      "page_id": 12,   # wikipedia page id of annotated page
      "sections": [...]  # list of section objects
    }
    
    section = {
      "name": "Introduction",             # section header
      "text": "Anarchism is an ...",          # plaintext of section
      "link_offsets": [16, 35, 49, ...],        # list of anchor text offsets
      "link_lengths": [18, 9, 17, ...],        # list of anchor text lengths
      "target_page_ids": [867979, 23040, 586276, ...] # list of link target page ids
    }
    

    The text attribute of each section object contains our parse of the section’s wikitext markup into plaintext. Text spans that represent links are identified via the attributes link_offsets, link_lengths, and target_page_ids.

    Wikidata Sample

    The second part of the KDWD is derived from Wikidata. Because more people are familiar with Wikipedia than Wikidata, we provide more background here than in the previous section. Wikidata provides centralized storage of structured data for all Wikimedia projects. The core Wikidata concepts are items, properties, and statements.

    In Wikidata, items are used to represent all the things in human knowledge, including topics, concepts, and objects. For example, the "1988 Summer Olympics", "love", "Elvis Presley", and "gorilla" are all items in Wikidata.

    -- https://www.wikidata.org/wiki/Help:Items

    A property describes the data value of a statement and can be thought of as a category of data, for example "color" for the data value "blue".

    -- https://www.wikidata.org/wiki/Help:Properties

    A statement is how the information we know about an item - the data we have about it - gets recorded in Wikidata. This happens by pairing a property with at least one data value

    -- https://www.wikidata.org/wiki/Help:Statements

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4301984%2F1c39f09bcbce766b7cef415a102be567%2Fkdwd_wikidata_image_nologo.jpg?generation=1579814918394405&alt=media" alt="">

    The image above shows several statements from the Wikidata item for Grace Hopper. We can think about these statements as triples with the form (item, property, data value).

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4301984%2F0a9c3aa58860f298db8b0e1f5952030e%2Fkdwd_statement_table.png?generation=1579214465126860&alt=media" alt="">

    In the first statement (Grace Hopper, date of birth, 9 December 1906) the data value represents a time. However, data values can have several different types (e.g., time, string, globecoordinate, item, …). If the data value in a statement triple is a Wikidata item, we call it a qpq-statement (note that each item has a unique i.d. beginning with Q and each property has a unique i.d. beginning with P). We can think of qpq-statements as triples of the form (source item, property, target_item). The qpq-statements in the image above are:

    In order to construct a compact Wikidata sample that is relevant to our Wikipedia sample, we start with all statements in Wikidata and filter down to those that:

    • have a data value that is a Wikidata item (i.e., qpq-statements)
    • have a source item associated with a Wikipedia page from our Wikipedia sample
    • are
  13. h

    wikidata-en-descriptions

    • huggingface.co
    Updated Aug 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Erenrich (2023). wikidata-en-descriptions [Dataset]. https://huggingface.co/datasets/derenrich/wikidata-en-descriptions
    Explore at:
    Dataset updated
    Aug 5, 2023
    Authors
    Daniel Erenrich
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    derenrich/wikidata-en-descriptions dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. Wikidata

    • web.archive.org
    full json dump +3
    Updated Oct 23, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikimedia (2018). Wikidata [Dataset]. https://www.wikidata.org/wiki/Wikidata:Data_access
    Explore at:
    simplified ("truthy") rdf n-triples dump, sparql endpoint, full json dump, full rdf turtle dumpAvailable download formats
    Dataset updated
    Oct 23, 2018
    Dataset provided by
    Wikimedia Foundationhttp://www.wikimedia.org/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Wikidata offers a wide range of general data about our universe as well as links to other databases. The data is published under the CC0 "Public domain dedication" license. It can be edited by anyone and is maintained by Wikidata's editor community.

  15. wikidata-20220103-all.json.gz

    • academictorrents.com
    bittorrent
    Updated Jan 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wikidata.org (2022). wikidata-20220103-all.json.gz [Dataset]. https://academictorrents.com/details/229cfeb2331ad43d4706efd435f6d78f40a3c438
    Explore at:
    bittorrentAvailable download formats
    Dataset updated
    Jan 3, 2022
    Dataset provided by
    Wikidata//wikidata.org/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A BitTorrent file to download data with the title 'wikidata-20220103-all.json.gz'

  16. o

    Data from: SchemaTree: Maximum-Likelihood Property Recommendation for...

    • omicsdi.org
    xml
    Updated Mar 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harth A (2024). SchemaTree: Maximum-Likelihood Property Recommendation for Wikidata [Dataset]. https://www.omicsdi.org/dataset/biostudies-literature/S-EPMC7250627
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Mar 20, 2024
    Authors
    Harth A
    Variables measured
    Unknown
    Description

    Wikidata is a free and open knowledge base which can be read and edited by both humans and machines. It acts as a central storage for the structured data of several Wikimedia projects. To improve the process of manually inserting new facts, the Wikidata platform features an association rule-based tool to recommend additional suitable properties. In this work, we introduce a novel approach to provide such recommendations based on frequentist inference. We introduce a trie-based method that can efficiently learn and represent property set probabilities in RDF graphs. We extend the method by adding type information to improve recommendation precision and introduce backoff strategies which further increase the performance of the initial approach for entities with rare property combinations. We investigate how the captured structure can be employed for property recommendation, analogously to the Wikidata PropertySuggester. We evaluate our approach on the full Wikidata dataset and compare its performance to the state-of-the-art Wikidata PropertySuggester, outperforming it in all evaluated metrics. Notably we could reduce the average rank of the first relevant recommendation by 71%.

  17. Wikidata dump 2017-12-27

    • zenodo.org
    bz2
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WikiData; WikiData (2020). Wikidata dump 2017-12-27 [Dataset]. http://doi.org/10.5281/zenodo.1211767
    Explore at:
    bz2Available download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    WikiData; WikiData
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description
  18. Wikidata Dump partial-wiki

    • zenodo.org
    • commons.datacite.org
    application/gzip, bin +1
    Updated Aug 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benno Fünfstück; Benno Fünfstück (2022). Wikidata Dump partial-wiki [Dataset]. http://doi.org/10.5281/zenodo.7019643
    Explore at:
    bin, json, application/gzipAvailable download formats
    Dataset updated
    Aug 25, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benno Fünfstück; Benno Fünfstück
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    RDF dump of wikidata produced with wdumper.

    basic filter
    View on wdumper

    entity count: 0, statement count: 0, triple count: 38

  19. Wikidata Dump

    • zenodo.org
    • commons.datacite.org
    application/gzip, bin +1
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benno Fünfstück; Benno Fünfstück (2020). Wikidata Dump [Dataset]. http://doi.org/10.5281/zenodo.3571488
    Explore at:
    json, bin, application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benno Fünfstück; Benno Fünfstück
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    RDF dump of wikidata produced with wdumps.

        <p>
        <br>
        <a href="https://tools.wmflabs.org/wdumps/dump/22">View on wdumper</a>
        </p>
    
        <p>
        <b>entity count</b>: 0, <b>statement count</b>: 0, <b>triple count</b>: 0
        </p>
    
  20. o

    Wikidata Dump companies

    • explore.openaire.eu
    • zenodo.org
    Updated Jan 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benno Fünfstück (2022). Wikidata Dump companies [Dataset]. http://doi.org/10.5281/zenodo.5852002
    Explore at:
    Dataset updated
    Jan 15, 2022
    Authors
    Benno Fünfstück
    Description

    RDF dump of wikidata produced with wdumps. companies, simple statement off, full statement mode complete, KR, EN View on wdumper entity count: 0, statement count: 0, triple count: 0

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2021). Wikidata [Dataset]. http://identifiers.org/biolink:WIKIDATA

Wikidata

Explore at:
Dataset updated
Nov 13, 2021
Description

Wikidata is a collaboratively edited knowledge base operated by the Wikimedia Foundation. It is intended to provide a common source of certain types of data which can be used by Wikimedia projects such as Wikipedia. Wikidata functions as a document-oriented database, centred on individual items. Items represent topics, for which basic information is stored that identifies each topic.

Search
Clear search
Close search
Google apps
Main menu