15 datasets found

Publication and Maintenance of Relational Data in Enterprise Knowledge...
zenodo.org
data.niaid.nih.gov
pdf, txt
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
temporarily hidden; temporarily hidden (2024). Publication and Maintenance of Relational Data in Enterprise Knowledge Graphs Created (Files used in the experiments) [Dataset]. http://doi.org/10.5281/zenodo.6465759
Explore at:
txt, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6465759
Dataset updated
Jul 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
temporarily hidden; temporarily hidden
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains two files created for the experiments presented in the article: Publication and Maintenance of RDB2RDF Views Externally Materialized in Enterprise Knowledge Graphs.

mapR2RML_MusicBrainz_completo.txt: We created the R2RML mapping for translating MBD data into the Music Ontology vocabulary, which is used for publishing the LMB view. The LMB view was materialized using the D2RQ tool. It took 67 minutes to materialize the view with approximately 41.1 GB of NTriples. We also provided SPARQL endpoint for querying LMB View.

TriggersAndProcedures.txt: We created the triggers, procedures, and class in java to implement the rules required to compute and publish the changesets.

relationalViewDefinition.pdf: This document gives details about the process of creating the relational views used in the experiments.
G
Graph Database Market Report
marketresearchforecast.com
doc, pdf, ppt
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Graph Database Market Report [Dataset]. https://www.marketresearchforecast.com/reports/graph-database-market-5306
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Jun 17, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Graph Database Market size was valued at USD 1.9 USD billion in 2023 and is projected to reach USD 7.91 USD billion by 2032, exhibiting a CAGR of 22.6 % during the forecast period. A graph database is one form of NoSQL database that contains and represents relationships as graphs. Graph databases do not presuppose the data as relations as most contemporary relational databases do, applying nodes, edges, and properties instead. The primary types include property graphs that permit attributes on the nodes and edges and RDF triplestores that center on subject-predicate-object triplets. Some of the features include; the method's ability to traverse relationships at high rates, the schema change is easy and the method is scalable. Some of the familiar use cases are social media, recommendations, anomalies or fraud detection, and knowledge graphs where the relationships are complex and require higher comprehension. These databases are considered valuable where the future connection between the items of data is as significant as the data themselves. Key drivers for this market are: Increasing Adoption of Cloud-based Managed Services to Drive Market Growth. Potential restraints include: Adverse Health Effect May Hamper Market Growth. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.
Desktop Database Software Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Desktop Database Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-desktop-database-software-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Desktop Database Software Market Outlook

In 2023, the global desktop database software market size was valued at approximately USD 5.5 billion and is projected to reach around USD 12.4 billion by 2032, growing at a Compound Annual Growth Rate (CAGR) of 9.2% during the forecast period. This robust growth is primarily driven by the increasing need for effective data management solutions across various industries and the rising adoption of digital transformation strategies worldwide.

One of the major growth factors for this market is the increasing volume of data generated by businesses. With the proliferation of IoT devices, social media, and enterprise applications, companies are producing an unprecedented amount of data. This surge in data generation necessitates sophisticated database software that can manage, store, and analyze data efficiently. Additionally, the growing importance of data-driven decision-making and analytics has heightened the demand for robust database solutions. Companies are increasingly looking to leverage data insights to gain a competitive edge, contributing significantly to market expansion.

Another critical factor is the advancement in technology, particularly in cloud computing. Cloud-based desktop database software offers numerous advantages, including scalability, flexibility, and cost-effectiveness. These benefits are particularly appealing to small and medium enterprises (SMEs) that may not have the resources to invest in extensive on-premises infrastructure. The cloud deployment model allows businesses to reduce their IT overheads and focus more on their core operations, further driving the adoption of desktop database software.

The increasing focus on cybersecurity and data protection is also fueling market growth. With rising instances of data breaches and cyber-attacks, businesses are becoming more vigilant about safeguarding their data. Desktop database software with robust security features is becoming essential to meet compliance requirements and protect sensitive information. This growing awareness and need for secure data management solutions are propelling the demand for advanced database software.

RDF Databases Software is gaining traction as a powerful tool for managing and querying complex data relationships. These databases are particularly adept at handling semantic data, making them ideal for applications that require understanding and interpretation of data context, such as knowledge graphs and linked data projects. The flexibility of RDF databases allows for dynamic data integration and interoperability across various platforms, which is increasingly important in today's data-driven world. As organizations continue to seek ways to harness the full potential of their data, the adoption of RDF databases is expected to rise, offering enhanced capabilities for semantic data processing and analysis. This trend is further supported by the growing interest in AI and machine learning, where RDF databases can play a crucial role in providing structured data for training and inference.

On the regional front, North America currently holds the largest market share due to its well-established IT infrastructure and the presence of numerous leading database software providers. The region's strong focus on technological innovation and early adoption of new technologies also play a significant role. Meanwhile, the Asia Pacific region is expected to witness the highest growth rate during the forecast period. The rapid digital transformation initiatives in countries like China and India, coupled with the increasing adoption of cloud services and the expansion of the SME sector, are key drivers for market growth in this region.

Type Analysis

When analyzing the desktop database software market by type, three primary categories emerge: Relational Database, NoSQL Database, and NewSQL Database. Relational databases have been the traditional backbone of enterprise data management for decades. They use structured query language (SQL) for defining and manipulating data, which makes them highly reliable for transactions and complex queries. Despite being an older technology, the demand for relational databases remains strong due to their robustness, reliability, and extensive support community. They are particularly favored in applications that require complex transactional capabilities, such as financial systems and enterprise resource planning (ERP) solutions.

N
Z
KGCW 2024 Challenge @ ESWC 2024
data.niaid.nih.gov
investigacion.usc.gal
+1more
Updated Jun 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Van Assche, Dylan (2024). KGCW 2024 Challenge @ ESWC 2024 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10721874
Explore at:
Dataset updated
Jun 11, 2024
Dataset provided by
Chaves-Fraga, David
Van Assche, Dylan
Dimou, Anastasia
Iglesias, Ana
Serles, Umutcan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Knowledge Graph Construction Workshop 2024: challenge

Knowledge graph construction of heterogeneous data has seen a lot of uptakein the last decade from compliance to performance optimizations with respectto execution time. Besides execution time as a metric for comparing knowledgegraph construction, other metrics e.g. CPU or memory usage are not considered.This challenge aims at benchmarking systems to find which RDF graphconstruction system optimizes for metrics e.g. execution time, CPU,memory usage, or a combination of these metrics.

Task description

The task is to reduce and report the execution time and computing resources(CPU and memory usage) for the parameters listed in this challenge, comparedto the state-of-the-art of the existing tools and the baseline results providedby this challenge. This challenge is not limited to execution times to createthe fastest pipeline, but also computing resources to achieve the most efficientpipeline.

We provide a tool which can execute such pipelines end-to-end. This tool alsocollects and aggregates the metrics such as execution time, CPU and memoryusage, necessary for this challenge as CSV files. Moreover, the informationabout the hardware used during the execution of the pipeline is available aswell to allow fairly comparing different pipelines. Your pipeline should consistof Docker images which can be executed on Linux to run the tool. The tool isalready tested with existing systems, relational databases e.g. MySQL andPostgreSQL, and triplestores e.g. Apache Jena Fuseki and OpenLink Virtuosowhich can be combined in any configuration. It is strongly encouraged to usethis tool for participating in this challenge. If you prefer to use a differenttool or our tool imposes technical requirements you cannot solve, please contactus directly.

Track 1: Conformance

The set of new specification for the RDF Mapping Language (RML) established by the W3C Community Group on Knowledge Graph Construction provide a set of test-cases for each module:

RML-Core

RML-IO

RML-CC

RML-FNML

RML-Star

These test-cases are evaluated in this Track of the Challenge to determine their feasibility, correctness, etc. by applying them in implementations. This Track is in Beta status because these new specifications have not seen any implementation yet, thus it may contain bugs and issues. If you find problems with the mappings, output, etc. please report them to the corresponding repository of each module.

Note: validating the output of the RML Star module automatically through the provided tooling is currently not possible, see https://github.com/kg-construct/challenge-tool/issues/1.

Through this Track we aim to spark development of implementations for the new specifications and improve the test-cases. Let us know your problems with the test-cases and we will try to find a solution.

Track 2: Performance

Part 1: Knowledge Graph Construction Parameters

These parameters are evaluated using synthetic generated data to have moreinsights of their influence on the pipeline.

Data

Number of data records: scaling the data size vertically by the number of records with a fixed number of data properties (10K, 100K, 1M, 10M records).

Number of data properties: scaling the data size horizontally by the number of data properties with a fixed number of data records (1, 10, 20, 30 columns).

Number of duplicate values: scaling the number of duplicate values in the dataset (0%, 25%, 50%, 75%, 100%).

Number of empty values: scaling the number of empty values in the dataset (0%, 25%, 50%, 75%, 100%).

Number of input files: scaling the number of datasets (1, 5, 10, 15).

Mappings

Number of subjects: scaling the number of subjects with a fixed number of predicates and objects (1, 10, 20, 30 TMs).

Number of predicates and objects: scaling the number of predicates and objects with a fixed number of subjects (1, 10, 20, 30 POMs).

Number of and type of joins: scaling the number of joins and type of joins (1-1, N-1, 1-N, N-M)

Part 2: GTFS-Madrid-Bench

The GTFS-Madrid-Bench provides insights in the pipeline with real data from thepublic transport domain in Madrid.

Scaling

GTFS-1 SQL

GTFS-10 SQL

GTFS-100 SQL

GTFS-1000 SQL

Heterogeneity

GTFS-100 XML + JSON

GTFS-100 CSV + XML

GTFS-100 CSV + JSON

GTFS-100 SQL + XML + JSON + CSV

Example pipeline

The ground truth dataset and baseline results are generated in different stepsfor each parameter:

The provided CSV files and SQL schema are loaded into a MySQL relational database.

Mappings are executed by accessing the MySQL relational database to construct a knowledge graph in N-Triples as RDF format

The pipeline is executed 5 times from which the median execution time of eachstep is calculated and reported. Each step with the median execution time isthen reported in the baseline results with all its measured metrics.Knowledge graph construction timeout is set to 24 hours. The execution is performed with the following tool: https://github.com/kg-construct/challenge-tool,you can adapt the execution plans for this example pipeline to your own needs.

Each parameter has its own directory in the ground truth dataset with thefollowing files:

Input dataset as CSV.

Mapping file as RML.

Execution plan for the pipeline in metadata.json.

Datasets

Knowledge Graph Construction Parameters

The dataset consists of:

Input dataset as CSV for each parameter.

Mapping file as RML for each parameter.

Baseline results for each parameter with the example pipeline.

Ground truth dataset for each parameter generated with the example pipeline.

Format

All input datasets are provided as CSV, depending on the parameter that is beingevaluated, the number of rows and columns may differ. The first row is alwaysthe header of the CSV.

GTFS-Madrid-Bench

The dataset consists of:

Input dataset as CSV with SQL schema for the scaling and a combination of XML,

CSV, and JSON is provided for the heterogeneity.

Mapping file as RML for both scaling and heterogeneity.

SPARQL queries to retrieve the results.

Baseline results with the example pipeline.

Ground truth dataset generated with the example pipeline.

Format

CSV datasets always have a header as their first row.JSON and XML datasets have their own schema.

Evaluation criteria

Submissions must evaluate the following metrics:

Execution time of all the steps in the pipeline. The execution time of a step is the difference between the begin and end time of a step.

CPU time as the time spent in the CPU for all steps of the pipeline. The CPU time of a step is the difference between the begin and end CPU time of a step.

Minimal and maximal memory consumption for each step of the pipeline. The minimal and maximal memory consumption of a step is the minimum and maximum calculated of the memory consumption during the execution of a step.

Expected output

Duplicate values

Scale Number of Triples

0 percent 2000000 triples

25 percent 1500020 triples

50 percent 1000020 triples

75 percent 500020 triples

100 percent 20 triples

Empty values

Scale Number of Triples

0 percent 2000000 triples

25 percent 1500000 triples

50 percent 1000000 triples

75 percent 500000 triples

100 percent 0 triples

Mappings

Scale Number of Triples

1TM + 15POM 1500000 triples

3TM + 5POM 1500000 triples

5TM + 3POM 1500000 triples

15TM + 1POM 1500000 triples

Properties

Scale Number of Triples

1M rows 1 column 1000000 triples

1M rows 10 columns 10000000 triples

1M rows 20 columns 20000000 triples

1M rows 30 columns 30000000 triples

Records

Scale Number of Triples

10K rows 20 columns 200000 triples

100K rows 20 columns 2000000 triples

1M rows 20 columns 20000000 triples

10M rows 20 columns 200000000 triples

Joins

1-1 joins

Scale Number of Triples

0 percent 0 triples

25 percent 125000 triples

50 percent 250000 triples

75 percent 375000 triples

100 percent 500000 triples

1-N joins

Scale Number of Triples

1-10 0 percent 0 triples

1-10 25 percent 125000 triples

1-10 50 percent 250000 triples

1-10 75 percent 375000 triples

1-10 100 percent 500000 triples

1-5 50 percent 250000 triples

1-10 50 percent 250000 triples

1-15 50 percent 250005 triples

1-20 50 percent 250000 triples

1-N joins

Scale Number of Triples

10-1 0 percent 0 triples

10-1 25 percent 125000 triples

10-1 50 percent 250000 triples

10-1 75 percent 375000 triples

10-1 100 percent 500000 triples

5-1 50 percent 250000 triples

10-1 50 percent 250000 triples

15-1 50 percent 250005 triples

20-1 50 percent 250000 triples

N-M joins

Scale Number of Triples

5-5 50 percent 1374085 triples

10-5 50 percent 1375185 triples

5-10 50 percent 1375290 triples

5-5 25 percent 718785 triples

5-5 50 percent 1374085 triples

5-5 75 percent 1968100 triples

5-5 100 percent 2500000 triples

5-10 25 percent 719310 triples

5-10 50 percent 1375290 triples

5-10 75 percent 1967660 triples

5-10 100 percent 2500000 triples

10-5 25 percent 719370 triples

10-5 50 percent 1375185 triples

10-5 75 percent 1968235 triples

10-5 100 percent 2500000 triples

GTFS Madrid Bench

Generated Knowledge Graph

Scale Number of Triples

1 395953 triples

10 3959530 triples

100 39595300 triples

1000 395953000 triples

Queries

Query Scale 1 Scale 10 Scale 100 Scale 1000

Q1 58540 results 585400 results No results available No results available

Q2 636 results 11998 results
125565 results 1261368 results

Q3 421 results 4207 results 42067 results 420667 results

Q4 13 results 130 results 1300 results 13000 results

Q5 35 results 350 results 3500 results 35000 results

Q6 1 result 1 result 1 result 1 result

Q7 68 results 67 results 67 results 53 results

Q8 35460 results 354600 results No results available No results available

Q9 130 results 1300
Data from: KGCW 2023 Challenge @ ESWC 2023
zenodo.org
investigacion.usc.gal
application/gzip
Updated Apr 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Şimşek; Umutcan Şimşek; Ana Iglesias; Ana Iglesias (2024). KGCW 2023 Challenge @ ESWC 2023 [Dataset]. http://doi.org/10.5281/zenodo.7837289
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7837289
Dataset updated
Apr 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Şimşek; Umutcan Şimşek; Ana Iglesias; Ana Iglesias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Knowledge Graph Construction Workshop 2023: challenge

Knowledge graph construction of heterogeneous data has seen a lot of uptake
in the last decade from compliance to performance optimizations with respect
to execution time. Besides execution time as a metric for comparing knowledge
graph construction, other metrics e.g. CPU or memory usage are not considered.
This challenge aims at benchmarking systems to find which RDF graph
construction system optimizes for metrics e.g. execution time, CPU,
memory usage, or a combination of these metrics.

Task description

The task is to reduce and report the execution time and computing resources
(CPU and memory usage) for the parameters listed in this challenge, compared
to the state-of-the-art of the existing tools and the baseline results provided
by this challenge. This challenge is not limited to execution times to create
the fastest pipeline, but also computing resources to achieve the most efficient
pipeline.

We provide a tool which can execute such pipelines end-to-end. This tool also
collects and aggregates the metrics such as execution time, CPU and memory
usage, necessary for this challenge as CSV files. Moreover, the information
about the hardware used during the execution of the pipeline is available as
well to allow fairly comparing different pipelines. Your pipeline should consist
of Docker images which can be executed on Linux to run the tool. The tool is
already tested with existing systems, relational databases e.g. MySQL and
PostgreSQL, and triplestores e.g. Apache Jena Fuseki and OpenLink Virtuoso
which can be combined in any configuration. It is strongly encouraged to use
this tool for participating in this challenge. If you prefer to use a different
tool or our tool imposes technical requirements you cannot solve, please contact
us directly.

Part 1: Knowledge Graph Construction Parameters

These parameters are evaluated using synthetic generated data to have more
insights of their influence on the pipeline.

Data

Number of data records: scaling the data size vertically by the number of records with a fixed number of data properties (10K, 100K, 1M, 10M records).

Number of data properties: scaling the data size horizontally by the number of data properties with a fixed number of data records (1, 10, 20, 30 columns).

Number of duplicate values: scaling the number of duplicate values in the dataset (0%, 25%, 50%, 75%, 100%).

Number of empty values: scaling the number of empty values in the dataset (0%, 25%, 50%, 75%, 100%).

Number of input files: scaling the number of datasets (1, 5, 10, 15).

Mappings

Number of subjects: scaling the number of subjects with a fixed number of predicates and objects (1, 10, 20, 30 TMs).

Number of predicates and objects: scaling the number of predicates and objects with a fixed number of subjects (1, 10, 20, 30 POMs).

Number of and type of joins: scaling the number of joins and type of joins (1-1, N-1, 1-N, N-M)

Part 2: GTFS-Madrid-Bench

The GTFS-Madrid-Bench provides insights in the pipeline with real data from the
public transport domain in Madrid.

Scaling

GTFS-1 SQL

GTFS-10 SQL

GTFS-100 SQL

GTFS-1000 SQL

Heterogeneity

GTFS-100 XML + JSON

GTFS-100 CSV + XML

GTFS-100 CSV + JSON

GTFS-100 SQL + XML + JSON + CSV

Example pipeline

The ground truth dataset and baseline results are generated in different steps
for each parameter:

The provided CSV files and SQL schema are loaded into a MySQL relational database.

Mappings are executed by accessing the MySQL relational database to construct a knowledge graph in N-Triples as RDF format.

The constructed knowledge graph is loaded into a Virtuoso triplestore, tuned according to the Virtuoso documentation.

The provided SPARQL queries are executed on the SPARQL endpoint exposed by Virtuoso.

The pipeline is executed 5 times from which the median execution time of each
step is calculated and reported. Each step with the median execution time is
then reported in the baseline results with all its measured metrics.
Query timeout is set to 1 hour and knowledge graph construction timeout
to 24 hours. The execution is performed with the following tool: https://github.com/kg-construct/challenge-tool,
you can adapt the execution plans for this example pipeline to your own needs.

Each parameter has its own directory in the ground truth dataset with the
following files:

Input dataset as CSV.

Mapping file as RML.

Queries as SPARQL.

Execution plan for the pipeline in metadata.json.

Datasets

Knowledge Graph Construction Parameters

The dataset consists of:

Input dataset as CSV for each parameter.

Mapping file as RML for each parameter.

SPARQL queries to retrieve the results for each parameter.

Baseline results for each parameter with the example pipeline.

Ground truth dataset for each parameter generated with the example pipeline.

Format

All input datasets are provided as CSV, depending on the parameter that is being
evaluated, the number of rows and columns may differ. The first row is always
the header of the CSV.

GTFS-Madrid-Bench

The dataset consists of:

Input dataset as CSV with SQL schema for the scaling and a combination of XML,

CSV, and JSON is provided for the heterogeneity.

Mapping file as RML for both scaling and heterogeneity.

SPARQL queries to retrieve the results.

Baseline results with the example pipeline.

Ground truth dataset generated with the example pipeline.

Format

CSV datasets always have a header as their first row.
JSON and XML datasets have their own schema.

Evaluation criteria

Submissions must evaluate the following metrics:

Execution time of all the steps in the pipeline. The execution time of a step is the difference between the begin and end time of a step.

CPU time as the time spent in the CPU for all steps of the pipeline. The CPU time of a step is the difference between the begin and end CPU time of a step.

Minimal and maximal memory consumption for each step of the pipeline. The minimal and maximal memory consumption of a step is the minimum and maximum calculated of the memory consumption during the execution of a step.

Expected output

Duplicate values

Scale Number of Triples
0 percent 2000000 triples
25 percent 1500020 triples
50 percent 1000020 triples
75 percent 500020 triples
100 percent 20 triples

Empty values

Scale Number of Triples
0 percent 2000000 triples
25 percent 1500000 triples
50 percent 1000000 triples
75 percent 500000 triples
100 percent 0 triples

Mappings

Scale Number of Triples
1TM + 15POM 1500000 triples
3TM + 5POM 1500000 triples
5TM + 3POM 1500000 triples
15TM + 1POM 1500000 triples

Properties

Scale Number of Triples
1M rows 1 column 1000000 triples
1M rows 10 columns 10000000 triples
1M rows 20 columns 20000000 triples
1M rows 30 columns 30000000 triples

Records

Scale Number of Triples
10K rows 20 columns 200000 triples
100K rows 20 columns 2000000 triples
1M rows 20 columns 20000000 triples
10M rows 20 columns 200000000 triples

Joins

1-1 joins

Scale Number of Triples
0 percent 0 triples
25 percent 125000 triples
50 percent 250000 triples
75 percent 375000 triples
100 percent 500000 triples

1-N joins

Scale Number of Triples
1-10 0 percent 0 triples
1-10 25 percent 125000 triples
1-10 50 percent 250000 triples
1-10 75 percent 375000
o
dataset: Create interoperable and well-documented data frames
explore.openaire.eu
Updated Jun 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Antal (2022). dataset: Create interoperable and well-documented data frames [Dataset]. http://doi.org/10.5281/zenodo.6854273
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6854273
Dataset updated
Jun 23, 2022
Authors
Daniel Antal
Description
See the package documentation website on dataset.dataobservatory.eu. Report bugs and suggestions on Github: https://github.com/dataobservatory-eu/dataset/issues The primary aim of dataset is to build well-documented data.frames, tibbles or data.tables that follow the W3C Data Cube Vocabulary based on the statistical SDMX data cube model. Such standard R objects (data.fame, data.table, tibble, or well-structured lists like json) become highly interoperable and can be placed into relational databases, semantic web applications, archives, repositories. They follow the FAIR principles: they are findable, accessible, interoperable and reusable. Our datasets: Contain Dublin Core or DataCite (or both) metadata that makes the findable and easier accessible via online libraries. See vignette article Datasets With FAIR Metadata. Their dimensions can be easily and unambigously reduced to triples for RDF applications; they can be easily serialized to, or synchronized with semantic web applications. See vignette article From dataset To RDF. Contain processing metadata that greatly enhance the reproducibility of the results, and the reviewability of the contents of the dataset, including metadata defined by the DDI Alliance, which is particularly helpful for not yet processed data; Follow the datacube model of the Statistical Data and Metadata eXchange, therefore allowing easy refreshing with new data from the source of the analytical work, and particularly useful for datasets containing results of statistical operations in R; Correct exporting with FAIR metadata to the most used file formats and straighforward publication to open science repositories with correct bibliographical and use metadata. See Export And Publish a dataset. Relatively lightweight in dependencies and easily works with data.frame, tibble or data.table R objects.
D
Dutch Ships and Sailors - VOC Opvarenden as RDF
ssh.datastations.nl
application/gzip, bin +5
Updated Apr 19, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
L. Heerma van Voss; de de Boer; M. van Rossum; J. Leinenga; C.A. Davids; J. Lucassen; J.C.A. Schokkenbroek; R. Hoekstra; L. Heerma van Voss; de de Boer; M. van Rossum; J. Leinenga; C.A. Davids; J. Lucassen; J.C.A. Schokkenbroek; R. Hoekstra (2014). Dutch Ships and Sailors - VOC Opvarenden as RDF [Dataset]. http://doi.org/10.17026/DANS-Z57-KFKQ
Explore at:
application/gzip(3936386), application/gzip(7069214), text/x-python(2056), ttl(14498), zip(36643), bin(4722), ttl(24761436), txt(1710), ttl(3566), ttl(70889245), application/gzip(49820188), bin(1899), ttl(889821), ttl(112003620), ttl(628034768), bin(4633), application/gzip(4767149), pdf(1521687), bin(2724), bin(7243), pdf(6367401)Available download formats
Unique identifier
https://doi.org/10.17026/DANS-Z57-KFKQ
Dataset updated
Apr 19, 2014
Dataset provided by
DANS Data Station Social Sciences and Humanities
Authors
L. Heerma van Voss; de de Boer; M. van Rossum; J. Leinenga; C.A. Davids; J. Lucassen; J.C.A. Schokkenbroek; R. Hoekstra; L. Heerma van Voss; de de Boer; M. van Rossum; J. Leinenga; C.A. Davids; J. Lucassen; J.C.A. Schokkenbroek; R. Hoekstra
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Dutch Ships and Sailors is a project that aims to provide an infrastructure for maritime historical datasets, linking correlating data through semantic web technology. It brings together datasets related to recruitment and shipping in the East-India trade (mainly 18th century) and in the shipping of the northern provinces of the Netherlands (mainly 19th century). For the northern provinces, the database contains data on the personnel recruited, the ships, and other variables (muster rolls of the northern provinces of the Netherlands).Dutch Ships and Sailors is a Clarin IV-project, hosted by Huygens ING in collaboration with VU University Amsterdam, the International Institute of Social History and Scheepvaartmuseum Amsterdam.The data from this project are divided over 5 datasets. See the ‘Thematic collection: Dutch Ships and Sailors’ dataset for a full overview.This dataset is a RDF/Turtle conversion of EASY dataset: "Velzen, Drs A.J.M. van; Gaastra, Prof.dr. F.S. (2012-01-26), VOC Opvarenden, versie 6 - januari 2012; VOC Sea Voyagers", Persistent Identifier: urn:nbn:nl:ui:13-gupf-pd. This source dataset contains digitized 18th century personnel administration of the VOC (Dutch East India Company) collected from archival data and processed in a relational database.The source dataset was split into three separate parts: ‘opvarenden’, ‘soldijboeken’ and ‘begunstigden’ (voyagers, salary books and benificiaries). This substructuring was left intact in the subsequent conversions.
Semantic Triples from "A Collaborative, Realism-Based, Electronic Healthcare...
zenodo.org
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Andrew Miller; Mark Andrew Miller; Chirstian Stoeckert; Chirstian Stoeckert (2020). Semantic Triples from "A Collaborative, Realism-Based, Electronic Healthcare Graph: Public Data, Common Data Models, and Practical Instantiation" [Dataset]. http://doi.org/10.5281/zenodo.3358854
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3358854
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mark Andrew Miller; Mark Andrew Miller; Chirstian Stoeckert; Chirstian Stoeckert
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
These RDF triples (synthea_graph_exportable.nq.zip) are the result of modeling electronic health records (synthea_csv_output_turbo_cannonical.zip), that were synthesized with the Synthea software (https://github.com/synthetichealth/synthea). Anyone who loads them into a triplestore database is encouraged to provide feedback at https://github.com/PennTURBO/EhrGraphCollab/issues. The following abstract comes from a paper, describing the semantic instantiation process, and presented to the ICBO 2019 conference (https://drive.google.com/file/d/1eYXTBl75Wx3XPMmCIOZba-8Cv0DIhlRq/view).

ABSTRACT: There is ample literature on the semantic modeling of biomedical data in general, but less has been published on realism-based, semantic instantiation of electronic health records (EHR). Reasons include difficult design decisions and issues of data governance. A collaborative approach can address design and technology utilization issues, but is especially constrained by limited access to the data at hand: protected health information.

Effective collaboration can be facilitated by public EHR-like data sets, which would ideally include a large variety of datatypes mirroring actual EHRs and enough records to drive a performance assessment. An investment into reading public EHR-like data from a popular common data model (CDM) is preferable over reading each public data set’s native format.

In addition to identifying suitable public EHR-like data sets and CDMs, this paper addresses instantiation via relational-to-RDF mapping. The completed instantiation is available for download, and a competency question demonstrates fidelity across all discussed formats.
Z
Linked Open Data at cervantesvirtual.com
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Escobar (2020). Linked Open Data at cervantesvirtual.com [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_998616
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Escobar
Marco-Such
C. Carrasco
Candela
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The catalogue of the Biblioteca Virtual Miguel de Cervantes contains about 200,000 records which were originally created in compliance with the MARC21 standard. The entries in the catalogue have been recently migrated to a new relational database whose data model adheres to the conceptual models promoted by the International Federation of Library Associations and Institutions (IFLA), in particular, to the FRBR and FRAD specifications.

The database content has been later mapped, by means of an automated procedure, to RDF triples which employ mainly the RDA vocabulary (Resource Description and Access) to describe the entities, as well as their properties and relationships. In contrast to a direct transformation, the intermediate relational model provides tighter control over the process for example through referential integrity, and therefore enhanced validation of the output. This RDF-based semantic description of the catalogue is now accessible online.
w
dogdb-2G
data.wu.ac.at
n-triples, sql
Updated May 15, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2014). dogdb-2G [Dataset]. https://data.wu.ac.at/odso/linkeddatacatalog_dws_informatik_uni-mannheim_de/YjExMDUzNTAtNGY5Mi00MzNhLTgwZGQtMGFkNjM3YWMxZGQw
Explore at:
sql, n-triplesAvailable download formats
Dataset updated
May 15, 2014
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Baseline data for dogdb (data of government database). The relational data schema is 629 tables linked by foreign key constraints. The semantic data is 39 million RDF triples.
Semantic Triples from "A Collaborative, Realism-Based, Electronic Healthcare...
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Andrew Miller; Mark Andrew Miller; Chirstian Stoeckert; Chirstian Stoeckert (2020). Semantic Triples from "A Collaborative, Realism-Based, Electronic Healthcare Graph: Public Data, Common Data Models, and Practical Instantiation" [Dataset]. http://doi.org/10.5281/zenodo.2641233
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2641233
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mark Andrew Miller; Mark Andrew Miller; Chirstian Stoeckert; Chirstian Stoeckert
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
These RDF triples are the result of modeling electronic health care records synthesized with Synthea software and can be loaded into a triplestore. The following abstract comes from a paper, describing the semantic instantiation process, and submitted to the ICBO 2019 conference.

ABSTRACT: There is ample literature on the semantic modeling of biomedical data in general, but less has been published on realism-based, semantic instantiation of electronic health records (EHR). Reasons include difficult design decisions and issues of data governance. A collaborative approach can address design and technology utilization issues, but is especially constrained by limited access to the data at hand: protected health information.

Effective collaboration can be facilitated by public EHR-like data sets, which would ideally include a large variety of datatypes mirroring actual EHRs and enough records to drive a performance assessment. An investment into reading public EHR-like data from a popular common data model (CDM) is preferable over reading each public data set’s native format.

In addition to identifying suitable public EHR-like data sets and CDMs, this paper addresses instantiation via relational-to-RDF mapping. The completed instantiation is available for download, and a competency question demonstrates fidelity across all discussed formats.
w
dogdb-80G
data.wu.ac.at
gz, sql
Updated May 29, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semi-Synthetic Benchmark (2014). dogdb-80G [Dataset]. https://data.wu.ac.at/odso/datahub_io/ODJmN2EyNmQtZGY3Ny00NzViLWEwZTctOGI5Y2I5OWViNzhl
Explore at:
sql, gzAvailable download formats
Dataset updated
May 29, 2014
Dataset provided by
Semi-Synthetic Benchmark
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
80 GB semi-synthetic data for dogdb (data of government database). The relational data schema is 629 tables linked by foreign key constraints. The semantic data is 4.1 billion RDF triples.
o
Corpus Nummorum - Natural Language Processing Dataset
explore.openaire.eu
zenodo.org
Updated Sep 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Corpus_Nummorum (2024). Corpus Nummorum - Natural Language Processing Dataset [Dataset]. http://doi.org/10.5281/zenodo.13785725
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.13785725
Dataset updated
Sep 18, 2024
Authors
Corpus_Nummorum
Description
This Natural Language Processing (NLP) dataset contains a part of the MySQL Corpus Nummorum (CN) database. It covers Greek and Roman coins from ancient Thrace, Moesia Inferior, Troad and Mysia. The dataset contains 7,900 coin descriptions (or designs) created by the members of the CN project. Most of them are actual coin designs which can be linked through our relational database to the matching CN coins, types and their images. However, some of them (about 450) were only created for the training of the NLP model. There are nine different MySQL tables: data_coins: contains the data of all coins in the CN database data_coins_images: contains data of all images in the CN database data_coins_imagesets: contains the image pairs for the CN coins data_designs: contains every coin description in German, English and Bulgarian data_types: contains the data of alle coin types the Cn database nlp_hierarchy: contains the classes and subclasses of all entity categories nlp_list_entities: contains the data of all nlp entities in the CN database nlp_relation_extraction_en_v2: contains the annotations for the training of our NLP model nlp_training_designs: contains the coin designs used for training our NLP model Only tables 8 and 9 are important for NLP training, as they contain the descriptions and the corresponding annotations. The other tables (data_...) make it possible to link the coin descriptions with the various coins and types in the CN database. It is therefore also possible to provide the CN image data sets with the appropriate descriptions (CN - Coin Image Dataset and CN - Object Detection Coin Dataset). The other NLP tables provide information about the entities and relations in the descriptions and are used to create the RDF data for the nomisma.org portal. The tables of the relational CN database can be related via the various ID columns using foreign keys. For easier access without MySQL, we have attached two csv files with the descriptions in English and German and the annotations for the English designs. The annotations can be related to the descriptions via the Design_ID column. During the summer semester 2024, we held the "Data Challenge" event at our Department of Computer Science at the Goethe-University. Our students could choose between the Object Detection dataset and a Natural Language Processing dataset as their challenge. We gave the teams that decided to take part in the NLP challenge this dataset with the task of trying out their own ideas. Here are the results: LLM_RE Pipeline Coin description embeddings NLP coin app Now we would like to invite you to try out your own ideas and models on our coin data. If you have any questions or suggestions, please, feel free to contact us.
kgbench: dblp
zenodo.org
zip
Updated Dec 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Bloem; Peter Bloem; Xander Wilcke; Xander Wilcke; Lucas van Berkel; Lucas van Berkel; Victor de Boer; Victor de Boer (2020). kgbench: dblp [Dataset]. http://doi.org/10.5281/zenodo.4361787
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4361787
Dataset updated
Dec 21, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Peter Bloem; Peter Bloem; Xander Wilcke; Xander Wilcke; Lucas van Berkel; Lucas van Berkel; Victor de Boer; Victor de Boer
License
https://creativecommons.org/licenses/publicdomain/https://creativecommons.org/licenses/publicdomain/
Description
Graph neural networks and other machine learning models offer a promising direction for interpretable machine learning on relational and multimodal data. Until now, however, progress in this area is difficult to gauge. This is primarily due to a limited number of datasets with (a) a high enough number of labeled nodes in the test set for precise measurement of performance, and (b) a rich enough variety of of multimodal information to learn from. Here, we introduce a set of new benchmark tasks for node classification on knowledge graphs. We focus primarily on node classification, since this setting cannot be solved purely by node embedding models, instead requiring the model to pool information from several steps away in the graph. However, the datasets may also be used for link prediction. For each dataset, we provide test and validation sets of at least 1000 instances, with some containing more than 10\;000 instances. Each task can be performed in a purely relational manner, to evaluate the performance of a relational graph model in isolation, or with multimodal information, to evaluate the performance of multimodal relational graph models. All datasets are packaged in a CSV format that is easily consumable in any machine learning environment, together with the original source data in RDF and pre-processing code for full provenance. We provide code for loading the data into \texttt{numpy} and \texttt{pytorch}. We compute performance for several baseline models.
m
包括的な非リレーショナルデータベース市場規模、シェア、業界動向 2033
marketresearchintellect.com
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
マーケットリサーチインテレクト (2025). 包括的な非リレーショナルデータベース市場規模、シェア、業界動向 2033 [Dataset]. https://www.marketresearchintellect.com/ja/product/global-non-relational-databases-market-size-and-forecast/
Explore at:
Dataset updated
Jun 3, 2025
Dataset authored and provided by
マーケットリサーチインテレクト
License
https://www.marketresearchintellect.com/ja/privacy-policyhttps://www.marketresearchintellect.com/ja/privacy-policy
Area covered
Global
Description
この市場の規模とシェアは、次の基準で分類されます： Document Store (JSON Document Store, XML Document Store, Binary Document Store) and Key-Value Store (Distributed Key-Value Store, In-Memory Key-Value Store, Persistent Key-Value Store) and Column Family Store (Wide Column Store, Sparse Column Store, Multi-Model Column Store) and Graph Database (Property Graph Database, Resource Description Framework (RDF) Store, Hypergraph Database) and Time Series Database (Event Time Series Database, Log Time Series Database, Real-Time Time Series Database) and 地域別（北米、欧州、アジア太平洋、南米、中東およびアフリカ）
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Scale	Number of Triples
0 percent	2000000 triples
25 percent	1500020 triples
50 percent	1000020 triples
75 percent	500020 triples
100 percent	20 triples

Scale	Number of Triples
0 percent	2000000 triples
25 percent	1500000 triples
50 percent	1000000 triples
75 percent	500000 triples
100 percent	0 triples

Scale	Number of Triples
1TM + 15POM	1500000 triples
3TM + 5POM	1500000 triples
5TM + 3POM	1500000 triples
15TM + 1POM	1500000 triples

Scale	Number of Triples
1M rows 1 column	1000000 triples
1M rows 10 columns	10000000 triples
1M rows 20 columns	20000000 triples
1M rows 30 columns	30000000 triples

Scale	Number of Triples
10K rows 20 columns	200000 triples
100K rows 20 columns	2000000 triples
1M rows 20 columns	20000000 triples
10M rows 20 columns	200000000 triples

Scale	Number of Triples
0 percent	0 triples
25 percent	125000 triples
50 percent	250000 triples
75 percent	375000 triples
100 percent	500000 triples

Scale	Number of Triples
1-10 0 percent	0 triples
1-10 25 percent	125000 triples
1-10 50 percent	250000 triples
1-10 75 percent	375000

Facebook

Twitter

Click to copy link

Link copied

Cite

temporarily hidden; temporarily hidden (2024). Publication and Maintenance of Relational Data in Enterprise Knowledge Graphs Created (Files used in the experiments) [Dataset]. http://doi.org/10.5281/zenodo.6465759

Publication and Maintenance of Relational Data in Enterprise Knowledge Graphs Created (Files used in the experiments)

Explore at:

txt, pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6465759

Dataset updated

Jul 16, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

temporarily hidden; temporarily hidden

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains two files created for the experiments presented in the article: Publication and Maintenance of RDB2RDF Views Externally Materialized in Enterprise Knowledge Graphs.

mapR2RML_MusicBrainz_completo.txt: We created the R2RML mapping for translating MBD data into the Music Ontology vocabulary, which is used for publishing the LMB view. The LMB view was materialized using the D2RQ tool. It took 67 minutes to materialize the view with approximately 41.1 GB of NTriples. We also provided SPARQL endpoint for querying LMB View.

TriggersAndProcedures.txt: We created the triggers, procedures, and class in java to implement the rules required to compute and publish the changesets.

relationalViewDefinition.pdf: This document gives details about the process of creating the relational views used in the experiments.

Clear search

Close search

Google apps

Main menu

Publication and Maintenance of Relational Data in Enterprise Knowledge...

Graph Database Market Report

Desktop Database Software Market Report | Global Forecast From 2025 To 2033

Desktop Database Software Market Outlook

Type Analysis

KGCW 2024 Challenge @ ESWC 2024

Data from: KGCW 2023 Challenge @ ESWC 2023

Knowledge Graph Construction Workshop 2023: challenge

dataset: Create interoperable and well-documented data frames

Dutch Ships and Sailors - VOC Opvarenden as RDF

Semantic Triples from "A Collaborative, Realism-Based, Electronic Healthcare...

Linked Open Data at cervantesvirtual.com

dogdb-2G

Semantic Triples from "A Collaborative, Realism-Based, Electronic Healthcare...

dogdb-80G

Corpus Nummorum - Natural Language Processing Dataset

kgbench: dblp

包括的な 非リレーショナルデータベース市場 規模、シェア、業界動向 2033

Publication and Maintenance of Relational Data in Enterprise Knowledge Graphs Created (Files used in the experiments)See More Versions

包括的な非リレーショナルデータベース市場規模、シェア、業界動向 2033

Publication and Maintenance of Relational Data in Enterprise Knowledge Graphs Created (Files used in the experiments)