Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Wikidata - All Entities
This Hugging Face Data Set contains the entirety of Wikidata as of the date listed below. Wikidata is a freely licensed structured knowledge graph following the wiki model of user contributions. If you build on this data please consider contributing back to Wikidata. For more on the size and other statistics of Wikidata, see: Special:Statistics. Current Dump as of: 2024-03-04
Original Source
The data contained in this repository is retrieved… See the full description on the dataset page: https://huggingface.co/datasets/Wikimedians/wikidata-all.
Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Wikidata parallel descriptions en-ja
Parallel corpus for machine translation generated from wikidata dump (2024-05-06). Currently we processed only English/Japanese pair. The jsonl file is ready-to-train by Hugging Face transformers trainer for translation tasks.
Dataset Details
https://www.wikidata.org/wiki/Wikidata:Database_download
Dataset Creation
As Wikidata description field does not represent exact direct translation, filtering is required for… See the full description on the dataset page: https://huggingface.co/datasets/Mitsua/wikidata-parallel-descriptions-en-ja.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Persons of interest profiles from Wikidata, the structured data version of Wikipedia.
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
derenrich/wikidata-en-descriptions-small dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Regularly published dataset of PageRank scores for Wikidata entities. The underlying link graph is formed by a union of all links accross all Wikipedia language editions. Computation is performed Andreas Thalhammer with 'danker' available at https://github.com/athalhammer/danker . If you find the downloads here useful please feel free to leave a GitHub ⭐ at the repository and buy me a ☕ https://www.buymeacoffee.com/thalhamm
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Wikidata dump retrieved from https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2 on 27 Dec 2017
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dump from Wikidata from 2018-12-17 in JSON. This one is not avavailable anymore from Wikidata. It was downloaded originally from https://dumps.wikimedia.org/other/wikidata/20181217.json.gz and recompressed to fit on Zenodo.
Facebook
Twitterhttps://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
A BitTorrent file to download data with the title 'wikidata-20240902-all.json.bz2'
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Category-based imports from Wikidata, the structured data version of Wikipedia.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains triples curated from Wikidata surrounding news events with causal relations, and is released as part of our WWW'23 paper, "Event Prediction using Case-Based Reasoning over Knowledge Graphs".
Starting from a set of classes that we consider to be types of "events", we queried Wikidata to collect entities that were an instanceOf an event class and that were connected to another such event entity by a causal triple (https://www.wikidata.org/wiki/Wikidata:List_of_properties/causality). For all such cause-effect event pairs, we then collected a 3-hop neighborhood of outgoing triples.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains quality labels for 5000 Wikidata items applied by Wikidata editors. The labels correspond to the quality scale described at https://www.wikidata.org/wiki/Wikidata:Item_quality Each line is a JSON blob with the following fields: - item_quality: The labeled quality class (A-E)- rev_id: the revision identifier of the version of the item that was labeled- strata: The size of the item in bytes at the time it was sampled- page_len: The actual size of the item in bytes- page_title: The Qid of the item- claims: A dictionary including P31 "instance-of" values for filtering out certain types of itemsThe # of observations by class is: - A class: 322- B class: 438- C class: 1773- D class: 997- E class: 1470
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains information about commercial organizations (companies) and their relations with other commercial organizations, persons, products, locations, groups and industries. The dataset has the form of a graph. It has been produced by the SmartDataLake project (https://smartdatalake.eu), using data collected from Wikidata (https://www.wikidata.org).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a collection of Gender Indicators from Wikidata and Wikipedia of Human Biographies. Data is derived from the 2016-01-03 Wikidata snapshot.Each file describe the humans in Wikidata aggregated by Gender (Property:P21), and dissaggregated by the following Wikidata Properties: - Date of Birth (P569)- Date of Death (P570)- Place of Birth (P19)- Country of Citizenship (P27)- Ethnic Group (P172)- Field of Work (P101)- Occupation (P106)- Wikipedia Language ("Sitelinks") Further aggregations of the data are: - World Map (Countries derived from place of birth and citizenship)- World Cultures (Inglehart Welzel Map applied to World Map)- Gender Co-Occurence (Humans with multiple genders).Wikidata labels have be translated to English for convenience when possible. You may still see values with "QIDs" which means there was no English translation possible. In the case where there were multiple values, such as for occupation, the we count the gender as co-occuring with each occupation separately.For more information. http://wigi.wmflabs.org/
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
mapping between freebase and wikidata entities
This dataset maps freebase ids to wikidata ids and labels. It is useful for visualising and better understanding when working with datasets like fb15k-237 How it was created:
Download freebase-wikidata mapping from here. [compressed size: 21.2 MB] Download wikidata entities data from here. [compressed size: 81GB] Align labels with the freebase,wikidata id
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains mappings between Wikidata entities and Wikipedia sections. The mappings come in addition to the existing Wikidata sitelinks referencing Wikipedia pages.
The creation of the present dataset stems from the observation that only a fraction of Wikidata entities has a corresponding Wikipedia article in any language (we refer to the remaining entities, without an article, as orphans). However, a substantial number of orphan entities are indeed available in Wikipedia, but not at the page level; orphan entities can be described within existing Wikipedia articles in the form of sections, subsections, and paragraphs of a more generic concept or fact. The dataset provides a fine-grained mapping between Wikidata orphan entities and Wikipedia (sub)-sections.
Mappings are provided for English language.
The dataset is available in JSON and RDF formats and complies with the Wikibase data model.
In the JSON representation, an entity contains two fields: id (the unique identifier of an entity) and sectionlinks (links to Wikipedia sections). Each sectionlink record comprises a list of records1 with three fields: site, title, and url. A section title is appended to the page title separated with # symbol. Such a compound title is then URL-encoded and added to the URL path. Following the Wikidata guidelines, each entity is encoded as a single line.
Example:
{
"id": "Q715509",
"sectionlinks": {
"enwiki": [
{
"site": "enwiki",
"title": "Places in Harry Potter#Azkaban",
"url": "https://en.wikipedia.org/wiki/Places_in_Harry_Potter#Azkaban"
}
],
}
}
The RDF dump is serialized using the Turtle format and stores nodes describing Wikipedia links. Section titles are added in the same manner as described above.
Example:
<https://en.wikipedia.org/wiki/Places_in_Harry_Potter#Azkaban> a schema:Article ;
schema:about wd:Q715509 ;
schema:inLanguage "en" ;
schema:isPartOf <https://en.wikipedia.org/> ;
schema:name "Places in Harry Potter#Azkaban"@en .
<https://en.wikipedia.org/> wikibase:wikiGroup "wikipedia" .
1 As opposed to sitelinks, where each entity can be mapped with a unique Wikipedia page (one-to-one mapping), in sectionlinks we allow a one-to-many mapping, i.e., an entity can be mapped to multiple sections. For example, Tennis racket concept can be mapped to Tennis#Rackets and Racket (sports equipment)#Tennis sections.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Profiles of politically exposed persons from Wikidata, the structured data version of Wikipedia.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Wikidata offers a wide range of general data about our universe as well as links to other databases. The data is published under the CC0 "Public domain dedication" license. It can be edited by anyone and is maintained by Wikidata's editor community.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
For questions or use cases calling for large, multi-use aggregate data files, please visit the EOL Services forum at http://discuss.eol.org/c/eol-services read more
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A copy of a dump which was available from WikiMedia: https://dumps.wikimedia.org/wikidatawiki/entities/
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains corrections for Wikidata constraint violations extracted from the July 1st 2018 Wikidata full history dump.The following constraints are considered:* conflicts with: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Conflicts_with* distinct values: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Unique_value* inverse and symmetric: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Inverse https://www.wikidata.org/wiki/Help:Property_constraints_portal/Symmetric* item requires statement: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Item* one of: https://www.wikidata.org/wiki/Help:Property_constraints_portal/One_of* single value: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Single_value* type: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Type* value requires statement: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Target_required_claim* value type: https://www.wikidata.org/wiki/Help:Property_constraints_portal/Value_typeThe constraints.tsv file contains the list of most of the Wikidata constraints considered in this dataset (beware, there could be some discrepancies for type, valueType, itemRequiresClaim and valueRequiresClaim constraints).It is a tabbed-separated file with the following columns:* constrain id: the URI of the Wikidata statement describing the constraint* property id: the URI of the property that is constrained* type id: the URI of the constraint type (type, value type...). It is a Wikidata item.* 15 columns for the possible attributes of the constraint. If an attribute has multiple values, they are in the same cell but separated by a space. The columns are:** regex: https://www.wikidata.org/wiki/Property:P1793** exceptions: https://www.wikidata.org/wiki/Property:P2303** group by: https://www.wikidata.org/wiki/Property:P2304** items: https://www.wikidata.org/wiki/Property:P2305** property: https://www.wikidata.org/wiki/Property:P2306** namespace: https://www.wikidata.org/wiki/Property:P2307** class: https://www.wikidata.org/wiki/Property:P2308** relation: https://www.wikidata.org/wiki/Property:P2309** minimal date: https://www.wikidata.org/wiki/Property:P2310** maximum date: https://www.wikidata.org/wiki/Property:P2311** maximum value: https://www.wikidata.org/wiki/Property:P2312** minimal value: https://www.wikidata.org/wiki/Property:P2313** status: https://www.wikidata.org/wiki/Property:P2316** separator: https://www.wikidata.org/wiki/Property:P4155** scope: https://www.wikidata.org/wiki/Property:P5314The other files provide for each constraint type the list of all corrections extracted from the edit history. The format of the file is one line per correction with the following tabbed-separated values:* URI for the statement describing the constraint in Wikidata* URI of the revision that has solved the constraint violation* subject, predicate and object of the triple that was violating the constraint (separated by a tab)* the string "->"* subject, predicate and object of the triple(s) of the correction, each followed by "http://wikiba.se/history/ontology#deletion" if the triple has been removed or "http://wikiba.se/history/ontology#addition" if the triple has been added. Each component of these values is separated by a tab.More detailed explanations are provided in a soon to be published paper
Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Wikidata - All Entities
This Hugging Face Data Set contains the entirety of Wikidata as of the date listed below. Wikidata is a freely licensed structured knowledge graph following the wiki model of user contributions. If you build on this data please consider contributing back to Wikidata. For more on the size and other statistics of Wikidata, see: Special:Statistics. Current Dump as of: 2024-03-04
Original Source
The data contained in this repository is retrieved… See the full description on the dataset page: https://huggingface.co/datasets/Wikimedians/wikidata-all.