100+ datasets found
  1. Data from: WikiDBs - A Large-Scale Corpus Of Relational Databases From...

    • zenodo.org
    • data.niaid.nih.gov
    text/x-python, zip
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liane Vogel; Liane Vogel; Jan-Micha Bodensohn; Jan-Micha Bodensohn; Carsten Binnig; Carsten Binnig (2024). WikiDBs - A Large-Scale Corpus Of Relational Databases From Wikidata [Dataset]. http://doi.org/10.5281/zenodo.11559814
    Explore at:
    zip, text/x-pythonAvailable download formats
    Dataset updated
    Dec 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Liane Vogel; Liane Vogel; Jan-Micha Bodensohn; Jan-Micha Bodensohn; Carsten Binnig; Carsten Binnig
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    WikiDBs is an open-source corpus of 100,000 relational databases. We aim to support research on tabular representation learning on multi-table data. The corpus is based on Wikidata and aims to follow certain characteristics of real-world databases.

    WikiDBs was published as a spotlight paper at the Dataset & Benchmarks track at NeurIPS 2024.

    WikiDBs contains the database schemas, as well as table contents. The database tables are provided as CSV files, and each database schema as JSON. The 100,000 databases are available in five splits, containing 20k databases each. In total, around 165 GB of disk space are needed for the full corpus. We also provide a script to convert the databases into SQLite.

  2. o

    wiki-category-consistency-cache

    • explore.openaire.eu
    • data.niaid.nih.gov
    • +1more
    Updated Jul 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leila Feddoul; Frank Löffler; Sirko Schindler (2022). wiki-category-consistency-cache [Dataset]. http://doi.org/10.5281/zenodo.6913134
    Explore at:
    Dataset updated
    Jul 27, 2022
    Authors
    Leila Feddoul; Frank Löffler; Sirko Schindler
    Description

    A collection of SQLite database files containing all the data retrieved from the Wikidata JSON dump of 2022-05-02 and the Wikipedia SQL dumps of 2022-05-01 in the context of analyzing the consistency between Wikipedia and Wikidata categories. Detailed information can be found on the Github page.

  3. WikiDBs 10k - A Corpus Of Relational Databases From Wikidata

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Nov 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liane Vogel; Liane Vogel; Carsten Binnig; Carsten Binnig (2024). WikiDBs 10k - A Corpus Of Relational Databases From Wikidata [Dataset]. http://doi.org/10.5281/zenodo.8227452
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Liane Vogel; Liane Vogel; Carsten Binnig; Carsten Binnig
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    WikiDBs-10k (https://wikidbs.github.io/) is a corpus of relational databases built from Wikidata (https://www.wikidata.org/). This is the preliminary 10k version, the newer version of 100k databases (https://zenodo.org/records/11559814) includes more coherent databases and more diverse table and column names.

    The WikiDBs-10k corpus consists of 10,000 databases, for more details read our paper: https://ceur-ws.org/Vol-3462/TADA3.pdf (TaDA@VLDB'23)

    Each database is saved in a sub-folder, the table files are provided as csv files and the database schema as a json file.

    We thank Till Döhmen and Madelon Hulsebos for generously providing the table statistics from their GitSchemas dataset and Jan-Micha Bodensohn for converting the dataset to SQLite files. This work has been supported by the BMBF and the state of Hesse as part of the NHR Program and the BMBF project KompAKI (grant number 02L19C150), as well as the HMWK cluster project 3AI. Finally, we want to thank hessian.AI, and DFKI Darmstadt for their support.

  4. b

    Wikidata

    • bioregistry.io
    Updated Nov 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Wikidata [Dataset]. http://identifiers.org/biolink:WIKIDATA
    Explore at:
    Dataset updated
    Nov 13, 2021
    License

    https://bioregistry.io/spdx:CC0-1.0https://bioregistry.io/spdx:CC0-1.0

    Description

    Wikidata is a collaboratively edited knowledge base operated by the Wikimedia Foundation. It is intended to provide a common source of certain types of data which can be used by Wikimedia projects such as Wikipedia. Wikidata functions as a document-oriented database, centred on individual items. Items represent topics, for which basic information is stored that identifies each topic.

  5. g

    Organisations of data.gouv.fr linked to Wikidata | gimi9.com

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Organisations of data.gouv.fr linked to Wikidata | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_5d0d24af634f411c05d9ca9b_1/
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset lists the organisations of data.gouv.fr linked to the database Wikidata.org. The data is being consolidated and should be used with caution. As an indication, the reconciliation of the list of organisations of data.gouv.fr and Wikidata can be used to: * analyse the nature of the organisations (public administration, joint, undertaking, association, etc.); * have additional information about these same organisations (e.g. Github account or Twitter account); * obtain alternative labels to data.gouv.fr labels; * display links to an organisation’s data.gouv.fr page from the corresponding Wikipedia article; * have a vision of the hierarchy of organisations (knowing that one organisation is a subsidiary of another).

  6. Data from: EventWiki: A knowledge base of major events

    • figshare.com
    pdf
    Updated Apr 29, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tao Ge; Lei Cui; Baobao Chang; Ming Zhou; Zhifang Sui (2016). EventWiki: A knowledge base of major events [Dataset]. http://doi.org/10.6084/m9.figshare.3171472.v12
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 29, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Tao Ge; Lei Cui; Baobao Chang; Ming Zhou; Zhifang Sui
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    EventWiki is a knowledge base of major events happening throughout mankind history. It contains 21,275 events of 95 types. The details of event entries can be found in our paper submission and documentation file. Data in the knowledge base is mainly harvested from Wikipedia.As Wikipedia, this resource can be distributed and shared under CC-BY 3.0 license.

  7. f

    Wikidata dump of 2015-02-23 (in RDF)

    • figshare.com
    bz2
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Hernández (2023). Wikidata dump of 2015-02-23 (in RDF) [Dataset]. http://doi.org/10.6084/m9.figshare.3394369.v1
    Explore at:
    bz2Available download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    figshare
    Authors
    Daniel Hernández
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This data contain the Wikidata of Feb 23, 2015 codified in four alternative schemes for our work Reifying RDF: What Works Well With Wikidata? presented in the International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS), Bethlehem, Pensylvania, Oct 11, 2015.

  8. P

    WikiReading Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jun 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Hewlett; Alexandre Lacoste; Llion Jones; Illia Polosukhin; Andrew Fandrianto; Jay Han; Matthew Kelcey; David Berthelot (2020). WikiReading Dataset [Dataset]. https://paperswithcode.com/dataset/wikireading
    Explore at:
    Dataset updated
    Jun 1, 2020
    Authors
    Daniel Hewlett; Alexandre Lacoste; Llion Jones; Illia Polosukhin; Andrew Fandrianto; Jay Han; Matthew Kelcey; David Berthelot
    Description

    WikiReading is a large-scale natural language understanding task and publicly-available dataset with 18 million instances. The task is to predict textual values from the structured knowledge base Wikidata by reading the text of the corresponding Wikipedia articles. The task contains a rich variety of challenging classification and extraction sub-tasks, making it well-suited for end-to-end models such as deep neural networks (DNNs).

  9. m

    Dataset of Pairs of an Image and Tags for Cataloging Image-based Records

    • data.mendeley.com
    • narcis.nl
    Updated Feb 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tokinori Suzuki (2022). Dataset of Pairs of an Image and Tags for Cataloging Image-based Records [Dataset]. http://doi.org/10.17632/msyc6mzvhg.1
    Explore at:
    Dataset updated
    Feb 24, 2022
    Authors
    Tokinori Suzuki
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Brief Explanation

    This dataset is created to develop and evaluate a cataloging system which assigns appropriate metadata to an image record for database management in digital libraries. That is assumed for evaluating a task, in which given an image and assigned tags, an appropriate Wikipedia page is selected for each of the given tags.

    A main characteristic of the dataset is including ambiguous tags. Thus, visual contents of images are not unique to their tags. For example, it includes a tag 'mouse' which has double meaning of not a mammal but a computer controller device. The annotations are corresponding Wikipedia articles for tags as correct entities by human judgement.

    The dataset offers both data and programs that reproduce experiments of the above-mentioned task. Its data consist of sources of images and annotations. The image sources are URLs of 420 images uploaded to Flickr. The annotations are a total 2,464 relevant Wikipedia pages manually judged for tags of the images. The dataset also provides programs in Jupiter notebook (scripts.ipynb) to conduct a series of experiments running some baseline methods for the designated task and evaluating the results.

    Structure of the Dataset

    1. data directory 1.1. image_URL.txt This file lists URLs of image files.

      1.2. rels.txt This file lists collect Wikipedia pages for each topic in topics.txt

      1.3. topics.txt This file lists a target pair, which is called a topic in this dataset, of an image and a tag to be disambiguated.

      1.4. enwiki_20171001.xml This file is extracted texts from the title and body parts of English Wikipedia articles as of 1st October 2017. This is a modified data of Wikipedia dump data (https://archive.org/download/enwiki-20171001).

    2. img directory This directory is a placeholder directory to fetch image files for downloading.

    3. results directory This directory is a placeholder directory to store results files for evaluation. It maintains three results of baseline methods in sub-directories. They contain json files each of which is a result of one topic, and are ready to be evaluated using an evaluation scripts in scripts.ipynb for reference of both usage and performance.

    4. scripts.ipynb The scripts for running baseline methods and evaluation are ready in this Jupyter notebook file.

  10. Wikidata

    • famedata.miraheze.org
    • web.archive.org
    full json dump +3
    Updated Jan 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikimedia (2022). Wikidata [Dataset]. https://famedata.miraheze.org/wiki/FAMEData:Data_access
    Explore at:
    sparql endpoint, full rdf turtle dump, simplified ("truthy") rdf n-triples dump, full json dumpAvailable download formats
    Dataset updated
    Jan 27, 2022
    Dataset provided by
    Wikimedia Foundationhttp://www.wikimedia.org/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Wikidata offers a wide range of general data about our universe as well as links to other databases. The data is published under the CC0 "Public domain dedication" license. It can be edited by anyone and is maintained by Wikidata's editor community.

  11. f

    Wikidata dump of 2016-01-04

    • figshare.com
    bz2
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Hernández (2023). Wikidata dump of 2016-01-04 [Dataset]. http://doi.org/10.6084/m9.figshare.3208498.v1
    Explore at:
    bz2Available download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    figshare
    Authors
    Daniel Hernández
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset is a dump of Wikidata in JSON format, produced in January 4, 2016 by the Wikimedia Foundation. Wikidata historical dumps are not preserved by the Wikimedia Foundation. Thus, this dump is distributed here in order to make our experiments repeatable along the time.

  12. .wiki TLD Whois Database | Whois Data Center

    • whoisdatacenter.com
    csv
    Updated Jul 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc (2025). .wiki TLD Whois Database | Whois Data Center [Dataset]. https://whoisdatacenter.com/tld/.wiki/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    AllHeart Web
    Authors
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Jul 14, 2025 - Dec 31, 2025
    Description

    .WIKI Whois Database, discover comprehensive ownership details, registration dates, and more for .WIKI TLD with Whois Data Center.

  13. NetQOS ORACLE Database Effort

    • catalog.data.gov
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2025). NetQOS ORACLE Database Effort [Dataset]. https://catalog.data.gov/dataset/netqos-oracle-database-effort
    Explore at:
    Dataset updated
    May 22, 2025
    Dataset provided by
    Social Security Administrationhttp://ssa.gov/
    Description

    Back-end ORACLE database to house NetQOS network performance and network characterization data for historical analysis and reporting.

  14. d

    Wikipedia XML revision history data dumps (stub-meta-history.xml.gz) from 20...

    • search.dataone.org
    • datadryad.org
    Updated Jun 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    R. Stuart Geiger; Aaron Halfaker (2025). Wikipedia XML revision history data dumps (stub-meta-history.xml.gz) from 20 April 2017 [Dataset]. http://doi.org/10.6078/D1FD3K
    Explore at:
    Dataset updated
    Jun 22, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    R. Stuart Geiger; Aaron Halfaker
    Time period covered
    Jan 1, 2017
    Description

    Wikipedia revision metadata for every edit to every page in seven major language versions of Wikipedia. Files in this collection are database dumps originally from https://dumps.wikimedia.org and were created by the Wikimedia Foundation, which deletes dumps after approximately 6 months. They are being uploaded here to preserve computational reproducibility of research projects based on these specific dumps. Files are in the format [language]wiki-20170420-stub-meta-history[part].xml.gz for English (en), German (de), Spanish (es), French (fr), Japanese (ja), Portuguese (pt), and Chinese Mandarin (zh) Wikipedias. Material found within is copyright Wikipedia contributors and is freely licensed under the GNU Free Documentation License (GFDL) and the Creative Commons Attribution-Share-Alike 3.0 License.

  15. P

    WikiSQL Dataset

    • paperswithcode.com
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victor Zhong; Caiming Xiong; Richard Socher, WikiSQL Dataset [Dataset]. https://paperswithcode.com/dataset/wikisql
    Explore at:
    Authors
    Victor Zhong; Caiming Xiong; Richard Socher
    Description

    WikiSQL consists of a corpus of 87,726 hand-annotated SQL query and natural language question pairs. These SQL queries are further split into training (61,297 examples), development (9,145 examples) and test sets (17,284 examples). It can be used for natural language inference tasks related to relational databases.

  16. A global reference database of crowdsourced cropland data collected using...

    • zenodo.org
    • doi.pangaea.de
    • +2more
    zip
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linda See; Linda See (2024). A global reference database of crowdsourced cropland data collected using the Geo-Wiki platform [Dataset]. http://doi.org/10.1594/pangaea.873912
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Linda See; Linda See
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A global reference dataset on cropland was collected through a crowdsourcing campaign implemented using Geo-Wiki. This reference dataset is based on a systematic sample at latitude and longitude intersections, enhanced in locations where the cropland probability varies between 25-75% for a better representation of cropland globally. Over a three week period, around 36K samples of cropland were collected. For the purpose of quality assessment, additional datasets are provided. One is a control dataset of 1793 sample locations that have been validated by students trained in image interpretation. This dataset was used to assess the quality of the crowd validations as the campaign progressed. Another set of data contains 60 expert or gold standard validations for additional evaluation of the quality of the participants. These three datasets have two parts, one showing cropland only and one where it is compiled per location and user. This reference dataset will be used to validate and compare medium and high resolution cropland maps that have been generated using remote sensing. The dataset can also be used to train classification algorithms in developing new maps of land cover and cropland extent.

  17. Detecting Synonymous Relationships by Shared Data-driven Definitions

    • figshare.com
    txt
    Updated Dec 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan-Christoph Kalo (2019). Detecting Synonymous Relationships by Shared Data-driven Definitions [Dataset]. http://doi.org/10.6084/m9.figshare.11343785.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 9, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Jan-Christoph Kalo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets that can be used together with the Code in: https://github.com/JanKalo/RuleAlign

  18. f

    Querying Wikidata: Comparing SPARQL, Relational and Graph Databases ―...

    • figshare.com
    html
    Updated May 4, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Hernández (2016). Querying Wikidata: Comparing SPARQL, Relational and Graph Databases ― Complementary Documentation [Dataset]. http://doi.org/10.6084/m9.figshare.3219217.v3
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 4, 2016
    Dataset provided by
    figshare
    Authors
    Daniel Hernández
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This resource includes complementary documentation that aims to make repeatable the experiments described in the paper Querying Wikidata: Comparing SPARQL, Relational and Graph Databases (by Daniel Hernández, Aidan Hogan, Cristian Riveros, Carlos Rojas and Enzo Zerega).

  19. BridgeDb: pathway identifier mapping database derived from Wikidata

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Dec 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Egon Willighagen; Egon Willighagen (2021). BridgeDb: pathway identifier mapping database derived from Wikidata [Dataset]. http://doi.org/10.5281/zenodo.5773822
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 13, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Egon Willighagen; Egon Willighagen
    Description

    First release of a BridgeDb pathway identifier mapping database. Currently supports Wikidata and WikiPathways identifiers. CCZero.

    [INFO]: Database finished.
    INFO: old database is Wikidata 1.0.0 (build: 20211211)
    INFO: new database is Wikidata 1.0.0 (build: 20211211)
    INFO: Number of ids in Wd (Wikidata): 905 (unchanged)
    INFO: Number of ids in Wp (WikiPathways): 900 (unchanged)
    INFO: new size is 2 Mb (changed +0.0%)
    INFO: total number of identifiers is 1805
    INFO: total number of mappings is 1810
    

  20. DDBS Programming Tables

    • catalog.data.gov
    • s.cnmilf.com
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2025). DDBS Programming Tables [Dataset]. https://catalog.data.gov/dataset/ddbs-programming-tables
    Explore at:
    Dataset updated
    May 22, 2025
    Dataset provided by
    Social Security Administrationhttp://ssa.gov/
    Description

    The data store to support the Division of Database Support (DDBS) developed processes such as the commit checkpoint routine.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Liane Vogel; Liane Vogel; Jan-Micha Bodensohn; Jan-Micha Bodensohn; Carsten Binnig; Carsten Binnig (2024). WikiDBs - A Large-Scale Corpus Of Relational Databases From Wikidata [Dataset]. http://doi.org/10.5281/zenodo.11559814
Organization logo

Data from: WikiDBs - A Large-Scale Corpus Of Relational Databases From Wikidata

Related Article
Explore at:
zip, text/x-pythonAvailable download formats
Dataset updated
Dec 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Liane Vogel; Liane Vogel; Jan-Micha Bodensohn; Jan-Micha Bodensohn; Carsten Binnig; Carsten Binnig
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

WikiDBs is an open-source corpus of 100,000 relational databases. We aim to support research on tabular representation learning on multi-table data. The corpus is based on Wikidata and aims to follow certain characteristics of real-world databases.

WikiDBs was published as a spotlight paper at the Dataset & Benchmarks track at NeurIPS 2024.

WikiDBs contains the database schemas, as well as table contents. The database tables are provided as CSV files, and each database schema as JSON. The 100,000 databases are available in five splits, containing 20k databases each. In total, around 165 GB of disk space are needed for the full corpus. We also provide a script to convert the databases into SQLite.

Search
Clear search
Close search
Google apps
Main menu