76 datasets found
  1. h

    freebase-wikidata-mapping

    • huggingface.co
    Updated Mar 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Knowledge Discovery & Management Lab, DA-IICT (2024). freebase-wikidata-mapping [Dataset]. https://huggingface.co/datasets/kdm-daiict/freebase-wikidata-mapping
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 28, 2024
    Dataset authored and provided by
    Knowledge Discovery & Management Lab, DA-IICT
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    mapping between freebase and wikidata entities

    This dataset maps freebase ids to wikidata ids and labels. It is useful for visualising and better understanding when working with datasets like fb15k-237 How it was created:

    Download freebase-wikidata mapping from here. [compressed size: 21.2 MB] Download wikidata entities data from here. [compressed size: 81GB] Align labels with the freebase,wikidata id

  2. wikidata-20240701-all.json.bz2

    • academictorrents.com
    bittorrent
    Updated Aug 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikidata Contributors (2024). wikidata-20240701-all.json.bz2 [Dataset]. https://academictorrents.com/details/dc083577b9f773ef0d41a3eba21b8694d5a56e99
    Explore at:
    bittorrent(89940529332)Available download formats
    Dataset updated
    Aug 30, 2024
    Dataset provided by
    Wikidata//wikidata.org/
    Authors
    Wikidata Contributors
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    A BitTorrent file to download data with the title 'wikidata-20240701-all.json.bz2'

  3. Z

    QALD-10 Wikidata Dump

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Jan 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Usbeck, Ricardo; Yan, Xi; Perevalov, Aleksandr; Jiang, Longquan; Schulz, Julius; Kraft, Angelie; Möller, Cedric; Huang, Junbo; Reineke, Jan; Ngonga Ngomo, Axel-Cyrille; Saleem, Muhammad; Both, Andreas (2023). QALD-10 Wikidata Dump [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7496689
    Explore at:
    Dataset updated
    Jan 3, 2023
    Dataset provided by
    Universität Hamburg
    Leipzig University of Applied Sciences
    Universität Paderborn
    Authors
    Usbeck, Ricardo; Yan, Xi; Perevalov, Aleksandr; Jiang, Longquan; Schulz, Julius; Kraft, Angelie; Möller, Cedric; Huang, Junbo; Reineke, Jan; Ngonga Ngomo, Axel-Cyrille; Saleem, Muhammad; Both, Andreas
    License

    Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    This data dump of Wikidata is published to allow fair and replicable evaluation of KGQA systems with the QALD-10 benchmark. QALD-10 is newly released and was used in the QALD-10 Challenge. Anyone interested in evaluating their KGQA systems with QALD-10 can download this dump and set up a local Wikidata endpoint in their server.

  4. Kensho Derived Wikimedia Dataset

    • kaggle.com
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kensho R&D (2020). Kensho Derived Wikimedia Dataset [Dataset]. https://www.kaggle.com/kenshoresearch/kensho-derived-wikimedia-data
    Explore at:
    zip(8760044227 bytes)Available download formats
    Dataset updated
    Jan 24, 2020
    Authors
    Kensho R&D
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Kensho Derived Wikimedia Dataset

    Wikipedia, the free encyclopedia, and Wikidata, the free knowledge base, are crowd-sourced projects supported by the Wikimedia Foundation. Wikipedia is nearly 20 years old and recently added its six millionth article in English. Wikidata, its younger machine-readable sister project, was created in 2012 but has been growing rapidly and currently contains more than 75 million items.

    These projects contribute to the Wikimedia Foundation's mission of empowering people to develop and disseminate educational content under a free license. They are also heavily utilized by computer science research groups, especially those interested in natural language processing (NLP). The Wikimedia Foundation periodically releases snapshots of the raw data backing these projects, but these are in a variety of formats and were not designed for use in NLP research. In the Kensho R&D group, we spend a lot of time downloading, parsing, and experimenting with this raw data. The Kensho Derived Wikimedia Dataset (KDWD) is a condensed subset of the raw Wikimedia data in a form that we find helpful for NLP work. The KDWD has a CC BY-SA 3.0 license, so feel free to use it in your work too.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4301984%2F972e4157b97efe8c2c5ea17c983b1504%2Fkdwd_header_logos_2.jpg?generation=1580510520532141&alt=media" alt="">

    This particular release consists of two main components - a link annotated corpus of English Wikipedia pages and a compact sample of the Wikidata knowledge base. We version the KDWD using the raw Wikimedia snapshot dates. The version string for this dataset is kdwd_enwiki_20191201_wikidata_20191202 indicating that this KDWD was built from the English Wikipedia snapshot from 2019 December 1 and the Wikidata snapshot from 2019 December 2. Below we describe these components in more detail.

    Example Notebooks

    Dive right in by checking out some of our example notebooks:

    Updates / Changelog

    • initial release 2020-01-31

    File Summary

    • Wikipedia
      • page.csv (page metadata and Wikipedia-to-Wikidata mapping)
      • link_annotated_text.jsonl (plaintext of Wikipedia pages with link offsets)
    • Wikidata
      • item.csv (item labels and descriptions in English)
      • item_aliases.csv (item aliases in English)
      • property.csv (property labels and descriptions in English)
      • property_aliases.csv (property aliases in English)
      • statements.csv (truthy qpq statements)

    Three Layers of Data

    The KDWD is three connected layers of data. The base layer is a plain text English Wikipedia corpus, the middle layer annotates the corpus by indicating which text spans are links, and the top layer connects the link text spans to items in Wikidata. Below we'll describe these layers in more detail.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4301984%2F19663d43bade0e92f578255f6e0d9dcd%2Fkensho_wiki_triple_layer.svg?generation=1580347573004185&alt=media" alt="">

    Wikipedia Sample

    The first part of the KDWD is derived from Wikipedia. In order to create a corpus of mostly natural text, we restrict our English Wikipedia page sample to those that:

  5. h

    wikidata-label-maps-20250820

    • huggingface.co
    Updated Aug 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yash Kumar Atri (2025). wikidata-label-maps-20250820 [Dataset]. https://huggingface.co/datasets/yashkumaratri/wikidata-label-maps-20250820
    Explore at:
    Dataset updated
    Aug 20, 2025
    Authors
    Yash Kumar Atri
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Wikidata Label Maps 2025-08-20

    Label maps extracted from the 2025-08-20 Wikidata dump.Use these to resolve Q and P identifiers to English labels quickly.

      Files
    

    entity_map.parquet - columns: id, label, descriptionQ items. 77.4M rows. prop_map.parquet - columns: id, label, description, datatypeP items. 11,568 rows.

    All files are Parquet with Zstandard compression.

      Download Options
    
    
    
    
    
      A) Hugging Face snapshot to a local folder
    

    from huggingface_hub import… See the full description on the dataset page: https://huggingface.co/datasets/yashkumaratri/wikidata-label-maps-20250820.

  6. E

    External References of English Wikipedia (ref-wiki-en)

    • live.european-language-grid.eu
    • data.niaid.nih.gov
    txt
    Updated Mar 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). External References of English Wikipedia (ref-wiki-en) [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7625
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 27, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    External References of English Wikipedia (ref-wiki-en) is a corpus of the plain-text content of 2,475,461 external webpages linked from the reference section of articles in English Wikipedia. Specifically:

    32,329,989 external reference URLs were extracted from a 2018 HTML dump of English Wikipedia. Removing repeated and ill-formed URLs yielded 23,036,318 unique URLs.These URLs were filtered to remove file extensions for unsupported formats (videos, audio, etc.), yielding 17,781,974 downloadable URLs. The URLs were loaded into Apache Nutch and continuously downloaded from August 2019 to December 2019, resulting in 2,475,461 successfully downloaded URLs. Not all URLs could be accessed. The order in which URLs were accessed was determined by Nutch, which partitions URLs by host and then randomly chooses amongst the URLs for each host.The content of these webpages were indexed in Apache Solr by Nutch. From Solr we extracted a JSON dump of the content.Many URLs offer a redirect; unfortunately Nutch does not index redirect information. This means that connecting the Wikipedia article (with the pre-direct link) to the downloaded webpage (at the post-redirect link) was complicated. However, by inspecting the order of download in the Nutch log files, we managed to recover links for 2,058,896 documents (83%) from their original Wikipedia article(s).We further managed to associate 3,899,953 unique Wikidata items with at least one external reference webpage in the corpus.

    The ref-en-wiki corpus is incomplete, i.e., we did not attempt to download all reference URLs for English Wikipedia. We thus also collect a smaller complete corpus for the external references of 5,000 Wikipedia articles (ref-wiki-en-5k). We sampled from 5 ranges of Wikidata items: Q1-10000, Q10001-100000, Q100001-1000000, Q1000001-10000000, and Q10000001-100000000. From each range we sampled 1000 items. We then scraped the external reference URLs for the Wikipedia article corresponding to these items and downloaded them. The resulting corpus contains 37,983 webpages.Each line of the corpus (ref-wiki-en, ref-wiki-en-5k) encodes the webpage of an external reference in JSON format. Specifically, we provide:

    tstamp: When the webpage was accessedhost: The domain (FQDN post-redirect) from which the webpage was retrieved.title: The title (meta) of the documenturl: The URL (post-redirect) of the webpageQ: The Q-code identifiers of the Wikidata items whose corresponding Wikipedia article is confirmed to link to this webpage.content: A plain-text encoding of the content of the webpage.

    Below we provide an abbreviated example of a line from the corpus:{""tstamp"":""2019-09-26T01:22:43.621Z"",""host"":""geology.isu.edu"",""title"":""Digital Geology of Idaho - Basin And Range"",""url"":""http://geology.isu.edu/Digital_Geology_Idaho/Module9/mod9.htm"",""Q"":[810178],""content"":""Digital Geology of Idaho - Basin And Range 1 - Idaho Basement Rock 2 - Belt Supergroup 3 - Rifting & Passive Margin 4 - Accreted Terranes 5 - Thrust Belt 6 - Idaho Batholith 7 - North Idaho & Mining 8 - Challis Volcanics 9 - Basin and Range 10 - Columbia River Basalts 11 - SRP & Yellowstone 12 - Pleistocene Glaciation 13 - Palouse & Lake Missoula 14 - Lake Bonneville Flood 15 - Snake River Plain Aquifer Basin and Range Province - Teritiary Extension General geology of the Basin and Range Province Mechanisms of Basin and Range faulting Idaho Basin and Range south of the Snake River Plain Idaho Basin and Range north of the Snake River Plain Local areas of active and recent Basin & Range faulting: Borah Peak PDF Slideshows: North of SRP , South of SRP , Borah Earthquake Flythroughs: Teton Valley , Henry's Fork , Big Lost River , Blackfoot , Portneuf , Raft River Valley , Bear River , Salmon Falls Creek , Snake River , Big Wood River Vocabulary Words thrust fault Basin and Range Snake River Plain half-graben transfer zone Fly-throughs General geology of the Basin and Range Province The Basin and Range Province generally includes most of eastern California, eastern Oregon, eastern Washington, Nevada, western Utah, southern and western Arizona, and southeastern Idaho. ...""},A summary of the files we make available:

    ref-wiki-en.json.gz: 2,475,461 external reference webpages (JSON format)ref-wiki-en_urls.txt.gz: 23,036,318 unique raw links to external references (plain-text format)ref-wiki-en-5k.json.gz: 37,983 external reference webpages (JSON format)ref-wiki-en-5k_urls.json.gz: 70,375 unique raw links to external references (plain-text format)ref-wiki-en-5k_Q.txt.gz: 5,000 Wikidata Q identifiers forming the 5k dataset (plain-text format)

    Further details can be found in the publication:

    Suggesting References for Wikidata Claims based on Wikipedia's External References. Paolo Curotto, Aidan Hogan. Wikidata Workshop @ISWC 2020.

    Further material relating to this publication (including code for a proof-of-concept interface) is also available.

  7. Wikidata jsons

    • kaggle.com
    zip
    Updated Feb 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timo Bozsolik (2020). Wikidata jsons [Dataset]. https://www.kaggle.com/timoboz/wikidata-jsons
    Explore at:
    zip(899549129 bytes)Available download formats
    Dataset updated
    Feb 4, 2020
    Authors
    Timo Bozsolik
    License

    Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
    License information was derived automatically

    Description

    This is a collection of pre-processed wikidata jsons which were used in the creation of CSQA dataset (Ref: https://arxiv.org/abs/1801.10314).

    Please refer to https://amritasaha1812.github.io/CSQA/download/ for more details.

  8. Z

    NILK

    • data.niaid.nih.gov
    Updated Mar 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iurshina, Anastasiia; Pan, Jiaxin; Boutalbi, Rafika (2023). NILK [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6599939
    Explore at:
    Dataset updated
    Mar 26, 2023
    Dataset provided by
    Stuttgart University
    Authors
    Iurshina, Anastasiia; Pan, Jiaxin; Boutalbi, Rafika
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    A dataset for the NIL-detection and NIL-disambiguation tasks.

    The NILK dataset has two main features: 1) It marks NIL-mentions for NIL-detection by extracting mentions which belong to newly added entities in Wikipedia text. 2) It provides an entity label for NIL-disambiguation by marking NIL-mentions with WikiData IDs from the newer dump.

    Dataset files contain JSON objects of the following structure:

    {"mention":"Walter Damrosch", "offset":348, "length":15, "context":"...the conductor Walter Damrosch. He scored the piece for the standard instruments of the symphony orchestra plus celesta, saxophone, and automobile horns...", "wikipedia_page_id":"309", "wikidata_id":"Q725579", "nil":false}

    The dataset contains both linked and not linked mentions, one can distinguish between them by checking "nil" flag. To obtain NIL-mentions, we compared two WikiData dumps: from 2017 and 2021. NIL-mentions have WikiData ID from WikiData 2021, one can use it to check whether these mentions refer to the same entity.

    The dataset was designed with the WikiData 2017 as the target knowledge base in mind: https://archive.org/download/wikibase-wikidatawiki-20170213/wikidata-20170213-all.json.gz

    nilk_03_2023.zip contains same data with longer contexts (unsplit)

  9. Wiki80-KG

    • figshare.com
    json
    Updated Sep 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hongmin Xiao (2025). Wiki80-KG [Dataset]. http://doi.org/10.6084/m9.figshare.19323371.v2
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Sep 2, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Hongmin Xiao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Relation extraction dataset with its knowledge graph.

  10. English Wikipedia People Dataset

    • kaggle.com
    zip
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikimedia (2025). English Wikipedia People Dataset [Dataset]. https://www.kaggle.com/datasets/wikimedia-foundation/english-wikipedia-people-dataset
    Explore at:
    zip(4293465577 bytes)Available download formats
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Wikimedia Foundationhttp://www.wikimedia.org/
    Authors
    Wikimedia
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Summary

    This dataset contains biographical information derived from articles on English Wikipedia as it stood in early June 2024. It was created as part of the Structured Contents initiative at Wikimedia Enterprise and is intended for evaluation and research use.

    The beta sample dataset is a subset of the Structured Contents Snapshot focusing on people with infoboxes in EN wikipedia; outputted as json files (compressed in tar.gz).

    We warmly welcome any feedback you have. Please share your thoughts, suggestions, and any issues you encounter on the discussion page for this dataset here on Kaggle.

    Data Structure

    • File name: wme_people_infobox.tar.gz
    • Size of compressed file: 4.12 GB
    • Size of uncompressed file: 21.28 GB

    Noteworthy Included Fields: - name - title of the article. - identifier - ID of the article. - image - main image representing the article's subject. - description - one-sentence description of the article for quick reference. - abstract - lead section, summarizing what the article is about. - infoboxes - parsed information from the side panel (infobox) on the Wikipedia article. - sections - parsed sections of the article, including links. Note: excludes other media/images, lists, tables and references or similar non-prose sections.

    The Wikimedia Enterprise Data Dictionary explains all of the fields in this dataset.

    Stats

    Infoboxes - Compressed: 2GB - Uncompressed: 11GB

    Infoboxes + sections + short description - Size of compressed file: 4.12 GB - Size of uncompressed file: 21.28 GB

    Article analysis and filtering breakdown: - total # of articles analyzed: 6,940,949 - # people found with QID: 1,778,226 - # people found with Category: 158,996 - people found with Biography Project: 76,150 - Total # of people articles found: 2,013,372 - Total # people articles with infoboxes: 1,559,985 End stats - Total number of people articles in this dataset: 1,559,985 - that have a short description: 1,416,701 - that have an infobox: 1,559,985 - that have article sections: 1,559,921

    This dataset includes 235,146 people articles that exist on Wikipedia but aren't yet tagged on Wikidata as instance of:human.

    Maintenance and Support

    This dataset was originally extracted from the Wikimedia Enterprise APIs on June 5, 2024. The information in this dataset may therefore be out of date. This dataset isn't being actively updated or maintained, and has been shared for community use and feedback. If you'd like to retrieve up-to-date Wikipedia articles or data from other Wikiprojects, get started with Wikimedia Enterprise's APIs

    Initial Data Collection and Normalization

    The dataset is built from the Wikimedia Enterprise HTML “snapshots”: https://enterprise.wikimedia.com/docs/snapshot/ and focuses on the Wikipedia article namespace (namespace 0 (main)).

    Who are the source language producers?

    Wikipedia is a human generated corpus of free knowledge, written, edited, and curated by a global community of editors since 2001. It is the largest and most accessed educational resource in history, accessed over 20 billion times by half a billion people each month. Wikipedia represents almost 25 years of work by its community; the creation, curation, and maintenance of millions of articles on distinct topics. This dataset includes the biographical contents of English Wikipedia language editions: English https://en.wikipedia.org/, written by the community.

    Attribution

    Terms and conditions

    Wikimedia Enterprise provides this dataset under the assumption that downstream users will adhere to the relevant free culture licenses when the data is reused. In situations where attribution is required, reusers should identify the Wikimedia project from which the content was retrieved as the source of the content. Any attribution should adhere to Wikimedia’s trademark policy (available at https://foundation.wikimedia.org/wiki/Trademark_policy) and visual identity guidelines (ava...

  11. b

    SpeakGer: A meta-data enriched speech corpus of German state and federal...

    • berd-platform.de
    csv
    Updated Jul 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kai-Robin Lange; Kai-Robin Lange; Carsten Jentsch; Carsten Jentsch (2025). SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments [Dataset]. http://doi.org/10.82939/g3225-rba63
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 25, 2025
    Dataset provided by
    BERD@NFDI
    Authors
    Kai-Robin Lange; Kai-Robin Lange; Carsten Jentsch; Carsten Jentsch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Germany
    Description

    A dataset of German parliament debates covering 74 years of plenary protocols across all 16 state parliaments of Germany as well as the German Bundestag. The debates are separated into individual speeches which are enriched with meta data identifying the speaker as a member of the parliament (mp).

    When using this data set, please cite the original paper "Lange, K.-R., Jentsch, C. (2023). SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments. Proceedings of the 3rd Workshop on Computational Linguistics for Political Text Analysis@KONVENS 2023.".

    The meta data is separated into two different types: time-specific meta-data that contains only information for a legislative period but can change over time (e.g. the party or constituency of an mp) and meta-data that is considered fixed, such as the birth date or the name of a speaker. The former information are stored aong with the speeches as it is considered temporal information of that point in time, but are additionally stored in the file all_mps_mapping.csv if there is the need to double-check something. The rest of the meta-data are stored in the file all_mps_meta.csv. The meta-data from this file can be matched with a speech by comparing the speaker ID-variable "MPID". The speeches of each parliament are saved in a csv format. Along with the speeches, they contain the following meta-data:

    • Period: int. The period in which the speech took place
    • Session: int. The session in which the speech took place
    • Chair: boolean. The information if the speaker was the chair of the plenary session
    • Interjection: boolean. The information if the speech is a comment or an interjection from the crowd
    • Party: list (e.g. ["cdu"] or ["cdu", "fdp"] when having more than one speaker during an interjection). List of the party of the speaker or the parties whom the comment/interjection references
    • Consituency: string. The consituency of the speaker in the current legislative period
    • MPID: int. The ID of the speaker, which can be used to get more meta-data from the file all_mps_meta.csv

    The file all_mps_meta.csv contains the following meta information:

    • MPID: int. The ID of the speaker, which can be used to match the mp with his/her speeches.
    • WikipediaLink: The Link to the mps Wikipedia page
    • WikiDataLink: The Link to the mps WikiData page
    • Name: string. The full name of the mp.
    • Last Name: string. The last name of the mp, found on WikiData. If no last name is given on WikiData, the full name was heuristically cut at the last space to get the information neccessary for splitting the speeches.
    • Born: string, format: YYYY-MM-DD. Birth date of the mp. If an exact birth date is found on WikiData, this exact date is used. Otherwise, a day in the year of birth given on Wikipedia is used.
    • SexOrGender: string. Information on the sex or gender of the mp. Disclaimer: This infomation was taken from WikiData, which does not seem to differentiate between sex or gender.
    • Occupation: list. Occupation(s) of the mp.
    • Religion: string. Religious believes of the mp.
    • AbgeordnetenwatchID: int. ID of the mp on the website Abgeordnetenwatch

  12. New Taipei City Government Event Information_ Download Attachment Version

    • data.gov.tw
    csv
    Updated Nov 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Development and Evaluation Commission, New Taipei City Government (2025). New Taipei City Government Event Information_ Download Attachment Version [Dataset]. https://data.gov.tw/en/datasets/139723
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 17, 2025
    Dataset provided by
    New Taipei Cityhttp://www.tpc.gov.tw/
    Research, Development and Evaluation Commissionhttp://archive.rdec.gov.tw/mp110.htm
    Authors
    Research Development and Evaluation Commission, New Taipei City Government
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Area covered
    New Taipei City
    Description

    This information is adjusted to match the city government's official website revamp operation, replacing the existing "New Taipei City Government Event Information" on the platform, and adding attachment downloads, and does not include HTML syntax. For details, please refer to the latest news and instruction files.

  13. Which keywords of Digital Downloads are trending on WooCommerce?

    • ecommerce.aftership.com
    Updated Apr 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AfterShip (2025). Which keywords of Digital Downloads are trending on WooCommerce? [Dataset]. https://ecommerce.aftership.com/product-trends/digital-downloads
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset authored and provided by
    AfterShiphttps://www.aftership.com/
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Identify fastest-growing Digital Downloads keywords on WooCommerce. Analyze trending scores to identify the most relevant search terms and stay ahead of market trends for your store.

  14. Address Point Rooftop Data

    • caliper.com
    cdf, dwg, dxf, gdb +9
    Updated Nov 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caliper Corporation (2020). Address Point Rooftop Data [Dataset]. https://www.caliper.com/mapping-software-data/address-point-data.htm
    Explore at:
    sql server mssql, kml, shp, postgresql, cdf, postgis, gdb, sdo, ntf, kmz, geojson, dwg, dxfAvailable download formats
    Dataset updated
    Nov 17, 2020
    Dataset authored and provided by
    Caliper Corporationhttp://www.caliper.com/
    License

    https://www.caliper.com/license/maptitude-license-agreement.htmhttps://www.caliper.com/license/maptitude-license-agreement.htm

    Time period covered
    2020
    Area covered
    United States
    Description

    Address point data for use with GIS mapping software, databases, and web applications are from Caliper Corporation and contain a point layer of over 48 million addresses in 22 states and the District of Columbia.

  15. Building Footprints

    • caliper.com
    cdf, dwg, dxf, gdb +9
    Updated Nov 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caliper Corporation (2020). Building Footprints [Dataset]. https://www.caliper.com/mapping-software-data/building-footprint-data.htm
    Explore at:
    dxf, gdb, postgis, cdf, kml, sdo, postgresql, geojson, kmz, shp, ntf, sql server mssql, dwgAvailable download formats
    Dataset updated
    Nov 17, 2020
    Dataset authored and provided by
    Caliper Corporationhttp://www.caliper.com/
    License

    https://www.caliper.com/license/maptitude-license-agreement.htmhttps://www.caliper.com/license/maptitude-license-agreement.htm

    Time period covered
    2020
    Area covered
    Canada, United States
    Description

    Area layers of US, Australia, and Canada building footprints for use with GIS mapping software, databases, and web applications.

  16. Which keywords of Digital Downloads are trending on Magento?

    • ecommerce.aftership.com
    Updated Apr 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AfterShip (2025). Which keywords of Digital Downloads are trending on Magento? [Dataset]. https://ecommerce.aftership.com/product-trends/digital-downloads
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset authored and provided by
    AfterShiphttps://www.aftership.com/
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Identify fastest-growing Digital Downloads keywords on Magento. Analyze trending scores to identify the most relevant search terms and stay ahead of market trends for your store.

  17. Which keywords of Digital Downloads attract shoppers on TikTok Shop?

    • ecommerce.aftership.com
    Updated Apr 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AfterShip (2025). Which keywords of Digital Downloads attract shoppers on TikTok Shop? [Dataset]. https://ecommerce.aftership.com/product-trends/digital-downloads
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset authored and provided by
    AfterShiphttps://www.aftership.com/
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Discover top-performing keywords for Digital Downloads on TikTok Shop. Analyze monthly growth rate rankings to discover trending search terms and capitalize on emerging opportunities for your store.

  18. wiki-movies

    • kaggle.com
    zip
    Updated Nov 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anton Kostin (2022). wiki-movies [Dataset]. https://www.kaggle.com/datasets/visualcomments/wikimovies/code
    Explore at:
    zip(32142478 bytes)Available download formats
    Dataset updated
    Nov 18, 2022
    Authors
    Anton Kostin
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    We obtained 130 406 movies description and categories from wikipedia using 1) local wikidata dump to find movies names and 2) wikipediaapi library to download description and categories to each movie.

  19. Data from: 5logos Dataset

    • universe.roboflow.com
    zip
    Updated Jan 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    gabi_alexandru22@yahoo.com (2023). 5logos Dataset [Dataset]. https://universe.roboflow.com/gabi_alexandru22-yahoo-com/5logos
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 25, 2023
    Dataset provided by
    Yahoohttp://yahoo.com/
    Authors
    gabi_alexandru22@yahoo.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects Bounding Boxes
    Description

    5logos

    ## Overview
    
    5logos is a dataset for object detection tasks - it contains Objects annotations for 3,717 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  20. Banking Compliance Data

    • caliper.com
    Updated Sep 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caliper Corporation (2023). Banking Compliance Data [Dataset]. https://www.caliper.com/mapping-software-data/banking-data.html
    Explore at:
    Dataset updated
    Sep 19, 2023
    Dataset authored and provided by
    Caliper Corporationhttp://www.caliper.com/
    License

    https://www.caliper.com/license/maptitude-license-agreement.htmhttps://www.caliper.com/license/maptitude-license-agreement.htm

    Area covered
    United States
    Description

    FREE layers of banking compliance data for the United States are now available for users of the current version of Maptitude. Three separate geographic files and one table are included in this download..

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Knowledge Discovery & Management Lab, DA-IICT (2024). freebase-wikidata-mapping [Dataset]. https://huggingface.co/datasets/kdm-daiict/freebase-wikidata-mapping

freebase-wikidata-mapping

kdm-daiict/freebase-wikidata-mapping

Explore at:
11 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 28, 2024
Dataset authored and provided by
Knowledge Discovery & Management Lab, DA-IICT
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

mapping between freebase and wikidata entities

This dataset maps freebase ids to wikidata ids and labels. It is useful for visualising and better understanding when working with datasets like fb15k-237 How it was created:

Download freebase-wikidata mapping from here. [compressed size: 21.2 MB] Download wikidata entities data from here. [compressed size: 81GB] Align labels with the freebase,wikidata id

Search
Clear search
Close search
Google apps
Main menu