Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A BitTorrent file to download data with the title 'wikidata-20220103-all.json.gz'
Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Wikidata parallel descriptions en-ja
Parallel corpus for machine translation generated from wikidata dump (2024-05-06). Currently we processed only English/Japanese pair. The jsonl file is ready-to-train by Hugging Face transformers trainer for translation tasks.
Dataset Details
https://www.wikidata.org/wiki/Wikidata:Database_download
Dataset Creation
As Wikidata description field does not represent exact direct translation, filtering is required for… See the full description on the dataset page: https://huggingface.co/datasets/Mitsua/wikidata-parallel-descriptions-en-ja.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Regularly published dataset of PageRank scores for Wikidata entities. The underlying link graph is formed by a union of all links accross all Wikipedia language editions. Computation is performed Andreas Thalhammer with 'danker' available at https://github.com/athalhammer/danker . If you find the downloads here useful please feel free to leave a GitHub ⭐ at the repository and buy me a ☕ https://www.buymeacoffee.com/thalhamm
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dump from Wikidata from 2018-12-17 in JSON. This one is not avavailable anymore from Wikidata. It was downloaded originally from https://dumps.wikimedia.org/other/wikidata/20181217.json.gz and recompressed to fit on Zenodo.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains quality labels for 5000 Wikidata items applied by Wikidata editors. The labels correspond to the quality scale described at https://www.wikidata.org/wiki/Wikidata:Item_quality Each line is a JSON blob with the following fields: - item_quality: The labeled quality class (A-E)- rev_id: the revision identifier of the version of the item that was labeled- strata: The size of the item in bytes at the time it was sampled- page_len: The actual size of the item in bytes- page_title: The Qid of the item- claims: A dictionary including P31 "instance-of" values for filtering out certain types of itemsThe # of observations by class is: - A class: 322- B class: 438- C class: 1773- D class: 997- E class: 1470
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Wikidata all data dump 時間: 2020/2/10 格式:JSON 檔案格式:gz
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains information about commercial organizations (companies) and their relations with other commercial organizations, persons, products, locations, groups and industries. The dataset has the form of a graph. It has been produced by the SmartDataLake project (https://smartdatalake.eu), using data collected from Wikidata (https://www.wikidata.org).
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
derenrich/wikidata-en-descriptions dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a collection of Gender Indicators from Wikidata and Wikipedia of Human Biographies. Data is derived from the 2016-01-03 Wikidata snapshot.Each file describe the humans in Wikidata aggregated by Gender (Property:P21), and dissaggregated by the following Wikidata Properties: - Date of Birth (P569)- Date of Death (P570)- Place of Birth (P19)- Country of Citizenship (P27)- Ethnic Group (P172)- Field of Work (P101)- Occupation (P106)- Wikipedia Language ("Sitelinks") Further aggregations of the data are: - World Map (Countries derived from place of birth and citizenship)- World Cultures (Inglehart Welzel Map applied to World Map)- Gender Co-Occurence (Humans with multiple genders).Wikidata labels have be translated to English for convenience when possible. You may still see values with "QIDs" which means there was no English translation possible. In the case where there were multiple values, such as for occupation, the we count the gender as co-occuring with each occupation separately.For more information. http://wigi.wmflabs.org/
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Wikidata dump retrieved from https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2 on 27 Dec 2017
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A copy of a dump which was available from WikiMedia: https://dumps.wikimedia.org/wikidatawiki/entities/
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Wikidata: A Free and Open Knowledge Base Accessible: Readable and editable by humans and machines alike. Central Hub: Serves as the core storage for structured data across Wikimedia s sister projects, such as Wikipedia, Wikivoyage, Wiktionary, Wikisource, and more. Current Edition: This torrent represents an unofficial dump of Wikidata as of January 1st, 2024.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
derenrich/wikidata-enwiki-categories-and-statements dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset is a cleaned up and annotated version of an other dataset previously shared: https://figshare.com/articles/dataset/Wikidata_Constraints_Violations_-_July_2017/7712720This dataset contains corrections for Wikidata constraint violations extracted from the July 1st 2018 Wikidata full history dump.It has been created as part of a work named Neural Knowledge Base Repairs by Thomas Pellissier Tanon and Fabian Suchanek.An example of code making use of this dataset is available on GitHub: https://github.com/Tpt/bass-materials/blob/master/corrections_learning.ipynbThe following constraints are considered:* conflicts with: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Conflicts_with* distinct values: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Unique_value* inverse and symmetric:
https://www.wikidata.org/wiki/Help:Property_constraints_portal/Inverse
https://www.wikidata.org/wiki/Help:Property_constraints_portal/Symmetric* item requires statement: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Item* one of: https://www.wikidata.org/wiki/Help:Property_constraints_portal/One_of* single value: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Single_value* type: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Type* value requires statement: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Target_required_claim* value type: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Value_typeThe constraints.tsv file contains the list of most of the
Wikidata constraints considered in this dataset (beware, there could be
some discrepancies for type, valueType, itemRequiresClaim and
valueRequiresClaim constraints).It is a tabbed-separated file with the following columns:1. constrain id: the URI of the Wikidata statement describing the constraint2. property id: the URI of the property that is constrained3. type id: the URI of the constraint type (type, value type...). It is a Wikidata item.4. 15 columns for the possible attributes of the constraint. If an
attribute has multiple values, they are in the same cell but separated
by a space. The columns are:* regex: https://www.wikidata.org/wiki/Property:P1793* exceptions: https://www.wikidata.org/wiki/Property:P2303* group by: https://www.wikidata.org/wiki/Property:P2304* items: https://www.wikidata.org/wiki/Property:P2305* property: https://www.wikidata.org/wiki/Property:P2306* namespace: https://www.wikidata.org/wiki/Property:P2307* class: https://www.wikidata.org/wiki/Property:P2308* relation: https://www.wikidata.org/wiki/Property:P2309* minimal date: https://www.wikidata.org/wiki/Property:P2310* maximum date: https://www.wikidata.org/wiki/Property:P2311* maximum value: https://www.wikidata.org/wiki/Property:P2312* minimal value: https://www.wikidata.org/wiki/Property:P2313* status: https://www.wikidata.org/wiki/Property:P2316* separator: https://www.wikidata.org/wiki/Property:P4155* scope: https://www.wikidata.org/wiki/Property:P5314The other files provide for each constraint type the list of all
corrections extracted from the edit history. The format of the file is
one line per correction with the following tabbed-separated values: 1. constraint id 2. revision that fixed the constraint violation 3. first violation triple subject 4. first violation triple predicate 5. first violation triple object 6. second violation triple subject (blank if no second violation triple) 7. second violation triple predicate (blank if no second violation triple) 8. second violation triple object (blank if no second violation triple) 9. separator (not useful) 10. subject of the first triple in the correction 11. predicate of the first triple in the correction 12. object of the first triple in the correction 13. is the first triple in the correction an addition or a deletion (for a deletion and for an addition) 14. subject of the second triple in the correction (might not exist) 15. predicate of the econd triple in the correction (might not exist) 16. object of the econd triple in the correction (might not exist) 17. is the second triple in the correction an addition or a deletion (for a deletion and for an addition) (might not exist) 18. Description of the subject of the first violation triple encoded in JSON 19. Description of the object of the first violation triple encoded in JSON (might be empty for literals) 20. Description of the term of the second triple that has not already be described by the two previous description. (might be empty for literals or if there is no second triple)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains triples curated from Wikidata surrounding news events with causal relations, and is released as part of our WWW'23 paper, "Event Prediction using Case-Based Reasoning over Knowledge Graphs".
Starting from a set of classes that we consider to be types of "events", we queried Wikidata to collect entities that were an instanceOf an event class and that were connected to another such event entity by a causal triple (https://www.wikidata.org/wiki/Wikidata:List_of_properties/causality). For all such cause-effect event pairs, we then collected a 3-hop neighborhood of outgoing triples.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
UMLS_Wikidata is a German biomedical entity linking knowledge base that provides good coverage for German entity linking datasets such as WikiMed-DE-BEL. The knowledge base is created by filtering out the Wikidata items that contain the Concept Unique Itentifier (CUI) of UMLS. Each entry in the knowledge base consists of Wikidata QID, label, description, UMLS CUI and aliases. The resulting KB has 731,414 Wikidata QIDs, 599,330 unique CUIs and 671,797 unique (mention, CUI) pairs where mention includes label and aliases.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the predicted topic(s) for (almost) each Wikipedia article across languages. Each row contains the following columns:Qid,topic,probability,page_id,page_title,wiki_db Where: * Qid: Wikidata Item Id* topic: Topic based on the ORES draft topic model (https://www.mediawiki.org/wiki/Talk:ORES/Draft_topic) * probability: Probability to belong to the topic* page_id: page_id* page_title: page_title* wiki_db: wiki_db, for example for english Wikipedia is enwikiFor exampleQ1000211,Geography.Regions.Europe.Western_Europe,1.0,166578,Frières-Faillouël,euwikiTopics are predicted using the Wikidata-Topic model developed by Isaac Johnson (https://github.com/geohci/wikidata-topic-model)The source code to create this dataset can be found here:https://github.com/digitalTranshumant/wikidata-topic-model
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains two public knowledge graph datasets used in our paper Improving the Utility of Knowledge Graph Embeddings with Calibration. Each dataset is described below.
Note that for our experiments we split each dataset randomly 5 times into 80/10/10 train/validation/test splits. We recommend that users of our data do the same to avoid (potentially) overfitting models to a single dataset split.
wikidata-authors
This dataset was extracted by querying the Wikidata API for facts about people categorized as "authors" or "writers" on Wikidata. Note that all head entities of triples are people (authors or writers), and all triples describe something about that person (e.g., their place of birth, their place of death, or their spouse). The knowledge graph has 23,887 entities, 13 relations, and 86,376 triples.
The files are as follows:
entities.tsv: A tab-separated file of all unique entities in the dataset. The fields are as follows:
eid: The unique Wikidata identifier of this entity. You can find the corresponding Wikidata page at https://www.wikidata.org/wiki/.
label: A human-readable label of this entity (extracted from Wikidata).
relations.tsv: A tab-separated file of all unique relations in the dataset. The fields are as follows:
rid: The unique Wikidata identifier of this relation. You can find the corresponding Wikidata page at https://www.wikidata.org/wiki/Property:.
label: A human-readable label of this relation (extracted from Wikidata).
triples.tsv: A tab-separated file of all triples in the dataset, in the form of , , .
fb15krr-linked
This dataset is an extended version of the FB15k+ dataset provided by [Xie et al IJCAI16]. It has been linked to Wikidata using Freebase MIDs (machine IDs) as keys; we discarded triples from the original dataset that contained entities that could not be linked to Wikidata. We also removed reverse relations following the procedure described by [Toutanova and Chen CVSC2015]. Finally, we removed existing triples labeled as False and added predicted triples labeled as True based on the crowdsourced annotations we obtained in our True or False Facts experiment (see our paper for details). The knowledge graph consists of 14,289 entities, 770 relations, and 272,385 triples.
The files are as follows:
entities.tsv: A tab-separated file of all unique entities in the dataset. The fields are as follows:
mid: The Freebase machine ID (MID) of this entity.
wiki: The corresponding unique Wikidata identifier of this entity. You can find the corresponding Wikidata page at https://www.wikidata.org/wiki/.
label: A human-readable label of this entity (extracted from Wikidata).
types: All hierarchical types of this entity, as provided by [Xie et al IJCAI16].
relations.tsv: A tab-separated file of all unique relations in the dataset. The fields are as follows:
label: The hierarchical Freebase label of this relation.
triples.tsv: A tab-separated file of all triples in the dataset, in the form of , , .
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This projects aims at proving with data that it is necessary to analyze vernacular languages when dealing with events that are described using public sources likes Wikidata and Wikipedia. In order to retrieve and analyze events, it uses the wikivents Python package. We provide in the project directory the Jupyter Notebook that processed (and/or generate) the dataset directory content. Statistics from this analysis is located in the stats directory. The main statistics are reported in the associated paper.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A BitTorrent file to download data with the title 'wikidata-20220103-all.json.gz'