25 datasets found

The global Graph Analytics market size is USD 2522 million in 2024 and will...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research, The global Graph Analytics market size is USD 2522 million in 2024 and will expand at a compound annual growth rate (CAGR) of 34.0% from 2024 to 2031. [Dataset]. https://www.cognitivemarketresearch.com/graph-analytics-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
According to Cognitive Market Research, the global Graph Analytics market size will be USD 2522 million in 2024 and will expand at a compound annual growth rate (CAGR) of 34.0% from 2024 to 2031. Market Dynamics of Graph Analytics Market

Key Drivers for Graph Analytics Market

Increasing Recognition of the Advantages of Graph Databases- One of the main reasons for the Graph Analytics market is the increasing recognition of the advantages of graph databases. Unlike traditional relational databases, graph databases excel at handling complex relationships and interconnected data, making them ideal for use cases such as fraud detection, recommendation engines, and social network analysis. Businesses are leveraging these capabilities to uncover insights and patterns that were previously difficult to detect. The rise of big data and the need for real-time analytics are further driving the adoption of graph databases, as they offer enhanced performance and scalability for large-scale data sets. Additionally, advancements in artificial intelligence and machine learning are amplifying the value of graph databases, enabling more sophisticated data modeling and predictive analytics. Growing Uptake of Big Data Tools to Drive the Graph Analytics Market's Expansion in the Years Ahead.

Key Restraints for Graph Analytics Market

Limited Awareness and Understanding pose a serious threat to the Graph Analytics industry. The market also faces significant difficulties related to data security and privacy.

Introduction of the Graph Analytics Market

The Graph Analytics Market is rapidly expanding, driven by the growing need for advanced data analysis techniques in various sectors. Graph analytics leverages graph structures to represent and analyze relationships and dependencies, providing deeper insights than traditional data analysis methods. Key factors propelling this market include the rise of big data, the increasing adoption of artificial intelligence and machine learning, and the demand for real-time data processing. Industries such as finance, healthcare, telecommunications, and retail are major contributors, utilizing graph analytics for fraud detection, personalized recommendations, network optimization, and more. Leading vendors are continually innovating to offer scalable, efficient solutions, incorporating advanced features like graph databases and visualization tools.
Event Data and Semantic Header for OCED-PG
zenodo.org
data.niaid.nih.gov
Updated Sep 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ava Swevels; Ava Swevels; Dirk Fahland; Dirk Fahland (2023). Event Data and Semantic Header for OCED-PG [Dataset]. http://doi.org/10.5281/zenodo.8296559
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8296559
Dataset updated
Sep 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ava Swevels; Ava Swevels; Dirk Fahland; Dirk Fahland
License
https://www.gnu.org/licenses/lgpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/lgpl-3.0-standalone.html
Description
Data sets and json files (describing the semantic header and dataset description) to build an Event Knowledge Graph (EKG) using OCED-PG as used in [1].

Provides input data for 6 datasets (BPIC14, BPIC15, BPIC16, BPIC17, BPIC19 and a simulated libraray example).

EKGs are built using OCED-PG, implemented in PromgG v0.1.25. The source code can be found at Github.

To build EKGs using OCED-PG

for one of the BPIC challenges, fork the query code from Github: ekg_bpi_challenges.

for the simulated library example, fork the query code from Github: ekg_library_example.

[1] Swevels, A., Fahland, D., Montali, M.: Implementing Object-Centric Event Data Models in Event Knowledge Graphs (2023)
Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL
zenodo.org
bz2, zip
Updated Jan 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aidan Hogan; Aidan Hogan; Cristian Riveros; Cristian Riveros; Carlos Rojas; Carlos Rojas; Adrián Soto; Adrián Soto (2021). Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL [Dataset]. http://doi.org/10.5281/zenodo.4035223
Explore at:
zip, bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.4035223
Dataset updated
Jan 11, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Aidan Hogan; Aidan Hogan; Cristian Riveros; Cristian Riveros; Carlos Rojas; Carlos Rojas; Adrián Soto; Adrián Soto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Wikidata Graph Pattern Benchmark (WGPB) is a benchmark consisting of 50 instances of 17 different abstract query patterns giving a total of 850 SPARQL queries. The goal of the benchmark is to test the performance of query engines for more complex basic graph patterns. The benchmark was designed for evaluating worst-case optimal join algorithms but also serves as a general-purpose benchmark for evaluating (basic) graph patterns. The queries are provided in SPARQL syntax and all return at least one solution. We limit the number of results returned to a maximum of 1,000.

Queries

We provide an example of a "square" basic graph pattern (comments are added here for readability):

SELECT * WHERE { ?x1 <http://www.wikidata.org/prop/direct/P149> ?x2 . # architectural style ?x2 <http://www.wikidata.org/prop/direct/P1269> ?x3 . # facet of ?x3 <http://www.wikidata.org/prop/direct/P156> ?x4 . # followed by ?x1 <http://www.wikidata.org/prop/direct/P135> ?x4 . # movement } LIMIT 1000

There are 49 other queries similar to this one in the dataset (replacing the predicates with other predicates), and 50 queries for 16 other abstract query patterns. For more details on these patterns, we refer to the publication mentioned below.

Note that you can try the queries on the public Wikidata Query Service, though some might give a timeout.

Generation

The queries were generated over a reduced version of the Wikidata truthy dump from November 15, 2018 that we call the Wikidata Core Graph (WCG). Specifically, in order to reduce the data volume, multilingual labels, comments, etc., were removed as they have limited use for evaluating joins (English labels were kept under schema:name). Thereafter, in order to facilitate the generation of the queries, triples with rare predicates appearing in fewer than 1,000 triples, and very common predicates appearing in more than 1,000,000 triples, were removed. The queries provided will generate the same results over both graphs.

Files

In this dataset, we then include three files:

wgpb-queries.zip The list of 850 queries

wikidata-wcg.nt.gz Wikidata truthy graph with English labels

wikidata-wcg-filtered.nt.bz2 Wikidata truthy graph with English labels filtering triples with rare (<1000 triples) and very common (>1000000) predicates

Code

We provide the code for generating the datasets, queries, etc., along with scripts and instructions on how to run these queries in a variety of SPARQL engines (Blazegraph, Jena, Virtuoso and our worst-case optimal variant of Jena), .

Publication

The benchmark is proposed, described and used in the following paper. You can find more details about how it was generated, the 17 abstract patterns that were used, as well as results for prominent SPARQL engines.

Aidan Hogan, Cristian Riveros, Carlos Rojas and Adrián Soto. "A Worst-Case Optimal Join Algorithm for SPARQL". In the Proceedings of the 18th International Semantic Web Conference (ISWC), Auckland, New Zealand, October 26–30, 2019.
f
DataSheet2_Threat modelling in Internet of Things (IoT) environments using...
frontiersin.figshare.com
pdf
Updated May 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marwa Salayma (2024). DataSheet2_Threat modelling in Internet of Things (IoT) environments using dynamic attack graphs.pdf [Dataset]. http://doi.org/10.3389/friot.2024.1306465.s002
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/friot.2024.1306465.s002
Dataset updated
May 30, 2024
Dataset provided by
Frontiers
Authors
Marwa Salayma
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This work presents a threat modelling approach to represent changes to the attack paths through an Internet of Things (IoT) environment when the environment changes dynamically, that is, when new devices are added or removed from the system or when whole sub-systems join or leave. The proposed approach investigates the propagation of threats using attack graphs, a popular attack modelling method. However, traditional attack-graph approaches have been applied in static environments that do not continuously change, such as enterprise networks, leading to static and usually very large attack graphs. In contrast, IoT environments are often characterised by dynamic change and interconnections; different topologies for different systems may interconnect with each other dynamically and outside the operator’s control. Such new interconnections lead to changes in the reachability amongst devices according to which their corresponding attack graphs change. This requires dynamic topology and attack graphs for threat and risk analysis. This article introduces an example scenario based on healthcare systems to motivate the work and illustrate the proposed approach. The proposed approach is implemented using a graph database management tool (GDBM), Neo4j, which is a popular tool for mapping, visualising, and querying the graphs of highly connected data. It is efficient in providing a rapid threat modelling mechanism, making it suitable for capturing security changes in the dynamic IoT environment. Our results show that our developed threat modelling approach copes with dynamic system changes that may occur in IoT environments and enables identifying attack paths, whilst allowing for system dynamics. The developed dynamic topology and attack graphs can cope with the changes in the IoT environment efficiently and rapidly by maintaining their associated graphs.
C
Event Graph of BPI Challenge 2019
data.4tu.nl
zip
Updated Apr 22, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dirk Fahland (2021). Event Graph of BPI Challenge 2019 [Dataset]. http://doi.org/10.4121/14169614.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/14169614.v1
Dataset updated
Apr 22, 2021
Dataset provided by
4TU.ResearchData
Authors
Dirk Fahland
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Business process event data modeled as labeled property graphs

Data Format
-----------

The dataset comprises one labeled property graph in two different file formats.

#1) Neo4j .dump format

A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/

/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=

The .dump was created with Neo4j v3.5.

#2) .graphml format

A .zip file containing a .graphml file of the entire graph

Data Schema
-----------

The graph is a labeled property graph over business process event data. Each graph uses the following concepts

:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"

:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")

:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node

:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations

:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities

:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.

:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log

:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph

:REL relationship - placeholder for any structural relationship between two :Entity nodes

The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552

Data Contents
-------------

neo4j-bpic19-2021-02-17 (.dump|.graphml.zip)

An integrated graph describing the raw event data of the entire BPI Challenge 2019 dataset.
van Dongen, B.F. (Boudewijn) (2019): BPI Challenge 2019. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:d06aff4b-79f0-45e6-8ec8-e19730c248f1

This data originated from a large multinational company operating from The Netherlands in the area of coatings and paints and we ask participants to investigate the purchase order handling process for some of its 60 subsidiaries. In particular, the process owner has compliance questions. In the data, each purchase order (or purchase document) contains one or more line items. For each line item, there are roughly four types of flows in the data: (1) 3-way matching, invoice after goods receipt: For these items, the value of the goods receipt message should be matched against the value of an invoice receipt message and the value put during creation of the item (indicated by both the GR-based flag and the Goods Receipt flags set to true). (2) 3-way matching, invoice before goods receipt: Purchase Items that do require a goods receipt message, while they do not require GR-based invoicing (indicated by the GR-based IV flag set to false and the Goods Receipt flags set to true). For such purchase items, invoices can be entered before the goods are receipt, but they are blocked until goods are received. This unblocking can be done by a user, or by a batch process at regular intervals. Invoices should only be cleared if goods are received and the value matches with the invoice and the value at creation of the item. (3) 2-way matching (no goods receipt needed): For these items, the value of the invoice should match the value at creation (in full or partially until PO value is consumed), but there is no separate goods receipt message required (indicated by both the GR-based flag and the Goods Receipt flags set to false). (4)Consignment: For these items, there are no invoices on PO level as this is handled fully in a separate process. Here we see GR indicator is set to true but the GR IV flag is set to false and also we know by item type (consignment) that we do not expect an invoice against this item. Unfortunately, the complexity of the data goes further than just this division in four categories. For each purchase item, there can be many goods receipt messages and corresponding invoices which are subsequently paid. Consider for example the process of paying rent. There is a Purchase Document with one item for paying rent, but a total of 12 goods receipt messages with (cleared) invoices with a value equal to 1/12 of the total amount. For logistical services, there may even be hundreds of goods receipt messages for one line item. Overall, for each line item, the amounts of the line item, the goods receipt messages (if applicable) and the invoices have to match for the process to be compliant. Of course, the log is anonymized, but some semantics are left in the data, for example: The resources are split between batch users and normal users indicated by their name. The batch users are automated processes executed by different systems. The normal users refer to human actors in the process. The monetary values of each event are anonymized from the original data using a linear translation respecting 0, i.e. addition of multiple invoices for a single item should still lead to the original item worth (although there may be small rounding errors for numerical reasons). Company, vendor, system and document names and IDs are anonymized in a consistent way throughout the log. The company has the key, so any result can be translated by them to business insights about real customers and real purchase documents.

The case ID is a combination of the purchase document and the purchase item. There is a total of 76,349 purchase documents containing in total 251,734 items, i.e. there are 251,734 cases. In these cases, there are 1,595,923 events relating to 42 activities performed by 627 users (607 human users and 20 batch users). Sometimes the user field is empty, or NONE, which indicates no user was recorded in the source system. For each purchase item (or case) the following attributes are recorded: concept:name: A combination of the purchase document id and the item id, Purchasing Document: The purchasing document ID, Item: The item ID, Item Type: The type of the item, GR-Based Inv. Verif.: Flag indicating if GR-based invoicing is required (see above), Goods Receipt: Flag indicating if 3-way matching is required (see above), Source: The source system of this item, Doc. Category name: The name of the category of the purchasing document, Company: The subsidiary of the company from where the purchase originated, Spend classification text: A text explaining the class of purchase item, Spend area text: A text explaining the area for the purchase item, Sub spend area text: Another text explaining the area for the purchase item, Vendor: The vendor to which the purchase document was sent, Name: The name of the vendor, Document Type: The document type, Item Category: The category as explained above (3-way with GR-based invoicing, 3-way without, 2-way, consignment).

The data contains the following entities and their events

- PO - Purchase Order documents handled at a large multinational company operating from The Netherlands
- POItem - an item in a Purchase Order document describing a specific item to be purchased
- Resource - the user or worker handling the document or a specific item
- Vendor - the external organization from which an item is to be purchased

Data Size
---------

BPIC19, nodes: 1926651, relationships: 15082099
u
Meshes
mivia.unisa.it
Updated Jan 1, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2013). Meshes [Dataset]. https://mivia.unisa.it/datasets/graph-database/arg-database/
Explore at:
Dataset updated
Jan 1, 2013
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Dataset of 2D,3D,4D unlabelled regular and nearly-regular meshes ranging from 100 to 1200 nodes
u
MIVIA ARG Dataset
mivia.unisa.it
zenodo.org
text/vf-format
Updated Jan 1, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MIVIA Lab (2013). MIVIA ARG Dataset [Dataset]. http://doi.org/10.1016/S0167-8655(02)00253-2
Explore at:
text/vf-formatAvailable download formats
Unique identifier
https://doi.org/10.1016/S0167-8655(02)00253-2
Dataset updated
Jan 1, 2013
Dataset authored and provided by
MIVIA Lab
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The ARG Database is a huge collection of labeled and unlabeled graphs realized by the MIVIA Group. The aim of this collection is to provide the graph research community with a standard test ground for the benchmarking of graph matching algorithms.
d
Alesco Phone ID Database - Identity Graph Data with over 860 Million Phone...
datarade.ai
.csv, .xls, .txt
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alesco Data, Alesco Phone ID Database - Identity Graph Data with over 860 Million Phone Number, covers 94% of the US population - available for licensing! [Dataset]. https://datarade.ai/data-products/alesco-phone-id-database-identity-graph-data-with-over-598-alesco-data
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset authored and provided by
Alesco Data
Area covered
United States
Description
Alesco Phone ID: Your Comprehensive Identity Graph Solution

In today's complex data landscape, having a clear and accurate view of your customers is essential. Alesco Phone ID provides the foundation for building a robust Identity Graph that delivers unparalleled insights. Our database is a rich source of Identity Data, including Phone Number Data / Telemarketing Data, that enables you to connect with your audience more effectively.

At the heart of our solution is Identity Linkage Data. By combining advanced data matching techniques with a vast array of public and private data sources, we create a powerful Identity Graph that links Phone Number Data to real people. This enables you to build detailed customer profiles, identify new opportunities, and optimize your marketing campaigns.

With over 860 million Phone Number Data points, including landlines, mobiles, and VoIP, our database offers unmatched coverage. Our proprietary technology processes an impressive 100 million phone signals daily, ensuring data accuracy and freshness. This continuous validation process guarantees that your Identity Graph is always up-to-date.

To provide maximum flexibility, we offer our Phone ID database as an on-premise solution. This gives you complete control over your Identity Data and allows you to integrate it seamlessly into your existing systems.

By leveraging Alesco Phone ID, you can:

Enhance your customer understanding through a robust Identity Graph Improve campaign targeting and personalization with precise Phone Number Data Optimize your Telemarketing efforts with accurate contact information Strengthen fraud prevention and identity verification with reliable Identity Linkage Data

Ready to elevate your data strategy? Contact Alesco today to learn how our Phone ID database can be the cornerstone of your Identity Graph solution.
f
DataSheet_2_NG-Tax 2.0: A Semantic Framework for High-Throughput Amplicon...
frontiersin.figshare.com
xlsx
Updated Jun 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wasin Poncheewin; Gerbern D. A. Hermes; Jesse C. J. van Dam; Jasper J. Koehorst; Hauke Smidt; Peter J. Schaap (2023). DataSheet_2_NG-Tax 2.0: A Semantic Framework for High-Throughput Amplicon Analysis.xlsx [Dataset]. http://doi.org/10.3389/fgene.2019.01366.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2019.01366.s002
Dataset updated
Jun 9, 2023
Dataset provided by
Frontiers
Authors
Wasin Poncheewin; Gerbern D. A. Hermes; Jesse C. J. van Dam; Jasper J. Koehorst; Hauke Smidt; Peter J. Schaap
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
NG-Tax 2.0 is a semantic framework for FAIR high-throughput analysis and classification of marker gene amplicon sequences including bacterial and archaeal 16S ribosomal RNA (rRNA), eukaryotic 18S rRNA and ribosomal intergenic transcribed spacer sequences. It can directly use single or merged reads, paired-end reads and unmerged paired-end reads from long range fragments as input to generate de novo amplicon sequence variants (ASV). Using the RDF data model, ASV’s can be automatically stored in a graph database as objects that link ASV sequences with the full data-wise and element-wise provenance, thereby achieving the level of interoperability required to utilize such data to its full potential. The graph database can be directly queried, allowing for comparative analyses of over thousands of samples and is connected with an interactive Rshiny toolbox for analysis and visualization of (meta) data. Additionally, NG-Tax 2.0 exports an extended BIOM 1.0 (JSON) file as starting point for further analyses by other means. The extended BIOM file contains new attribute types to include information about the command arguments used, the sequences of the ASVs formed, classification confidence scores and is backwards compatible. The performance of NG-Tax 2.0 was compared with DADA2, using the plugin in the QIIME 2 analysis pipeline. Fourteen 16S rRNA gene amplicon mock community samples were obtained from the literature and evaluated. Precision of NG-Tax 2.0 was significantly higher with an average of 0.95 vs 0.58 for QIIME2-DADA2 while recall was comparable with an average of 0.85 and 0.77, respectively. NG-Tax 2.0 is written in Java. The code, the ontology, a Galaxy platform implementation, the analysis toolbox, tutorials and example SPARQL queries are freely available at http://wurssb.gitlab.io/ngtax under the MIT License.
Most popular database management systems worldwide 2024
statista.com
Updated Jun 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Most popular database management systems worldwide 2024 [Dataset]. https://www.statista.com/statistics/809750/worldwide-popularity-ranking-database-management-systems/
Explore at:
Dataset updated
Jun 19, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 2024
Area covered
Worldwide
Description
As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of 1244.08; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.
TESTAR State Model extracted while executing MyThaiStar as web system under...
zenodo.org
data.niaid.nih.gov
application/gzip, bin
Updated Jun 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fernando Pastor Ricos; Tanja E. Vos; Tanja E. Vos; Fernando Pastor Ricos (2020). TESTAR State Model extracted while executing MyThaiStar as web system under test [Dataset]. http://doi.org/10.5281/zenodo.3895790
Explore at:
application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3895790
Dataset updated
Jun 29, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Fernando Pastor Ricos; Tanja E. Vos; Tanja E. Vos; Fernando Pastor Ricos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TESTAR extracted State Model datasets with TESTAR tool using MyThaiStar web application as System Under Test (SUT). This State Models has been generated to be used as an example to be automatically generated and introduced locally in DECODER PKM, from H2020 DECODER Project.

TESTAR tool is an open source tool (www.testar.org) for automated testing through graphical user interface (GUI) currently being developed by the Universitat Politecnica de Valencia and the Open University of the Netherlands.

MyThaiStar (github.com/devonfw/my-thai-star) is the reference application that Capgemini uses internally to promote best programming practices and the correct use of last technologies. It’s is developed with Devon Framework, the standard tool for development at the company.

PKM is the Persistent Knowledge Monitor developed as main infrastructure from H2020 DECODER Project (www.decoder-project.eu) under grant agreement number 824231.

As TESTAR explores automatically the SUT, it will use the Document Object Model (DOM) information extracted from MyThaiStar SUT, to generate and save a TESTAR State Model in the OrientDB graph database. This model contains information about the Widgets, States and Actions, that were found in the SUT.

- MyThaiStar.json.gz: JSON file exported from OrientDB that contains a database with the TESTAR State Model. It can be imported into OrientDB using the TESTAR tool, to analyze and interact with the State Model.

- ArtefactStateModel_MyThaiStar_2020.1_zpnffj5c3407972370_2020-06-15_12h14m24s: for DECODER project purposes, the knowledge extracted with TESTAR in the generation of the State Model has been summarized and referenced in an artifact JSON file to be adapted to PKM input requirements.

Socially Stigmatized Diseases Market Analysis North America, Europe, Asia,...

technavio.com

Updated Nov 30, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio (2024). Socially Stigmatized Diseases Market Analysis North America, Europe, Asia, Rest of World (ROW) - US, Germany, UK, Canada, France, China, Japan, India, Italy, South Korea - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/socially-stigmatized-diseases-market-industry-analysis

Explore at:

Dataset updated

Nov 30, 2024

Dataset provided by

TechNavio

Authors

Technavio

Time period covered

2021 - 2025

Area covered

United Kingdom, South Korea, Japan, Canada, United States, Global

Description

Snapshot img

Socially Stigmatized Diseases Market Size 2024-2028

The socially stigmatized diseases market size is forecast to increase by USD 343.5 billion, at a CAGR of 8.2% between 2023 and 2028.

The market is experiencing significant growth due to several key factors. One of the primary drivers is the high prevalence of diseases such as HIV and AIDS, which necessitates the development of effective treatments and delivery routes. Another trend is the increasing availability of medication for these diseases online, making it more accessible to those who may face social stigma or geographical barriers. However, there are also challenges that must be addressed, including bottlenecks in the supply chain and the persistent social stigma surrounding these diseases, which can hinder access to healthcare and treatment. To address these challenges, it is essential to focus on improving delivery routes and addressing the root causes of social stigma. By doing so, we can ensure that those in need have access to the care and treatment they require, ultimately leading to better health outcomes and a reduced burden on healthcare systems.

What will be the Size of the Market During the Forecast Period?

Request Free Sample

The market encompasses a range of health conditions that carry a significant social stigma, often leading to discrimination and marginalization. Effective data management and real-time analytics are crucial for understanding the dynamics of these diseases, improving public health interventions, and mitigating their societal impact. Data management in the market faces unique challenges. The lack of standardization in data collection, storage, and sharing across various industries and sectors hampers comprehensive disease surveillance and response efforts. Moreover, the sensitive nature of medical information associated with these diseases necessitates strong data security measures. Graph databases and property graph models offer promising solutions to address these challenges. Graph databases enable efficient handling of complex relationships between vertices (nodes) and edges (connections) in data. Labels can be assigned to vertices to represent different disease entities, while indexes facilitate rapid data retrieval.
In the market, long tasks and stored procedures are essential for processing large volumes of data in real-time. Real-time analytics enables logistics professionals in the finance and logistics industries to optimize routes, manage warehouses, and ensure efficient disease surveillance. The finance industry can leverage data modeling and visualization tools to identify patterns and trends in financial transactions related to socially stigmatized diseases. This information can inform investment strategies and risk assessments. In the logistics industry, real-time analytics can optimize supply chain operations and improve response times to disease outbreaks. Data centers and cloud regions play a vital role market by providing secure, scalable, and cost-effective storage solutions. Business processes can be automated using programming ease and visualization tools, enabling more efficient data analysis and decision-making.

How is this market segmented and which is the largest segment?

The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

Disease Type

  STIs
  Mental health disorders
  Cancer
  Others


Therapy

  Medications
  Therapy and counseling
  Others


Geography

  North America

    Canada
    US


  Europe

    Germany
    UK
    France
    Italy


  Asia

    China
    India
    Japan
    South Korea


  Rest of World (ROW)

By Disease Type Insights

The STIs segment is estimated to witness significant growth during the forecast period.

Sexually transmitted infections (STIs), also known as sexually transmitted diseases (STDs), are a significant sector within the global market for socially stigmatized diseases. These infections are contracted through sexual contact, affecting various parts of the body including the mouth, anus, vagina, and penis. STIs encompass a range of conditions, each presenting unique symptoms and health consequences. Common symptoms include discomfort such as burning sensations and itching in the genital area, as well as discharge. In India, as of 2023, the prevalence of STIs is substantial, with around 6% of the adult population reportedly affected by one or more STIs or reproductive tract infections (RTIs). This equates to approximately 30 to 35 million cases annually, underscoring the considerable public health challenge posed by these infections.

Get a glance at the market report of share of various segments Request Free Sample

The STIs segment was valued at USD 254.10 billion in 2018 and

Z
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...
data.niaid.nih.gov
zenodo.org
Updated Dec 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lignos, Dimitrios G. (2022). Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6965146
Explore at:
Dataset updated
Dec 24, 2022
Dataset provided by
Hartloper, Alexander R.
de Castro e Sousa, Albano
Lignos, Dimitrios G.
Ozden, Selimcan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials

Background

This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.

The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).

Usage

The data is licensed through the Creative Commons Attribution 4.0 International.

If you have used our data and are publishing your work, we ask that you please reference both:

this database through its DOI, and

any publication that is associated with the experiments. See the Overall_Summary and Database_References files for the associated publication references.

Included Files

Overall_Summary_2022-08-25_v1-0-0.csv: summarises the specimen information for all experiments in the database.

Summarized_Mechanical_Props_Campaign_2022-08-25_v1-0-0.csv: summarises the average initial yield stress and average initial elastic modulus per campaign.

Unreduced_Data-#_v1-0-0.zip: contain the original (not downsampled) data

Where # is one of: 1, 2, 3, 4, 5, 6. The unreduced data is broken into separate archives because of upload limitations to Zenodo. Together they provide all the experimental data.

We recommend you un-zip all the folders and place them in one "Unreduced_Data" directory similar to the "Clean_Data"

The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the unreduced data.

The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

Clean_Data_v1-0-0.zip: contains all the downsampled data

The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the clean data.

The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

Database_References_v1-0-0.bib

Contains a bibtex reference for many of the experiments in the database. Corresponds to the "citekey" entry in the summary files.

File Format: Downsampled Data

These are the "LP_

The header of the first column is empty: the first column corresponds to the index of the sample point in the original (unreduced) data

Time[s]: time in seconds since the start of the test

e_true: true strain

Sigma_true: true stress in MPa

(optional) Temperature[C]: the surface temperature in degC

These data files can be easily loaded using the pandas library in Python through:

import pandas data = pandas.read_csv(data_file, index_col=0)

The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.

File Format: Unreduced Data

These are the "LP_

The first column is the index of each data point

S/No: sample number recorded by the DAQ

System Date: Date and time of sample

Time[s]: time in seconds since the start of the test

C_1_Force[kN]: load cell force

C_1_Déform1[mm]: extensometer displacement

C_1_Déplacement[mm]: cross-head displacement

Eng_Stress[MPa]: engineering stress

Eng_Strain[]: engineering strain

e_true: true strain

Sigma_true: true stress in MPa

(optional) Temperature[C]: specimen surface temperature in degC

The data can be loaded and used similarly to the downsampled data.

File Format: Overall_Summary

The overall summary file provides data on all the test specimens in the database. The columns include:

hidden_index: internal reference ID

grade: material grade

spec: specifications for the material

source: base material for the test specimen

id: internal name for the specimen

lp: load protocol

size: type of specimen (M8, M12, M20)

gage_length_mm_: unreduced section length in mm

avg_reduced_dia_mm_: average measured diameter for the reduced section in mm

avg_fractured_dia_top_mm_: average measured diameter of the top fracture surface in mm

avg_fractured_dia_bot_mm_: average measured diameter of the bottom fracture surface in mm

fy_n_mpa_: nominal yield stress

fu_n_mpa_: nominal ultimate stress

t_a_deg_c_: ambient temperature in degC

date: date of test

investigator: person(s) who conducted the test

location: laboratory where test was conducted

machine: setup used to conduct test

pid_force_k_p, pid_force_t_i, pid_force_t_d: PID parameters for force control

pid_disp_k_p, pid_disp_t_i, pid_disp_t_d: PID parameters for displacement control

pid_extenso_k_p, pid_extenso_t_i, pid_extenso_t_d: PID parameters for extensometer control

citekey: reference corresponding to the Database_References.bib file

yield_stress_mpa_: computed yield stress in MPa

elastic_modulus_mpa_: computed elastic modulus in MPa

fracture_strain: computed average true strain across the fracture surface

c,si,mn,p,s,n,cu,mo,ni,cr,v,nb,ti,al,b,zr,sn,ca,h,fe: chemical compositions in units of %mass

file: file name of corresponding clean (downsampled) stress-strain data

File Format: Summarized_Mechanical_Props_Campaign

Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,

tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv', index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1], keep_default_na=False, na_values='')

citekey: reference in "Campaign_References.bib".

Grade: material grade.

Spec.: specifications (e.g., J2+N).

Yield Stress [MPa]: initial yield stress in MPa

size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

Elastic Modulus [MPa]: initial elastic modulus in MPa

size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

Caveats

The files in the following directories were tested before the protocol was established. Therefore, only the true stress-strain is available for each:

A500

A992_Gr50

BCP325

BCR295

HYP400

S460NL

S690QL/25mm

S355J2_Plates/S355J2_N_25mm and S355J2_N_50mm
SparkWiki: Wikipedia graph dataset and pagecounts pre-processing tools
zenodo.org
bin
Updated Feb 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Aspert; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Nicolas Aspert; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst (2021). SparkWiki: Wikipedia graph dataset and pagecounts pre-processing tools [Dataset]. http://doi.org/10.1145/3308560.3316757
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1145/3308560.3316757
Dataset updated
Feb 19, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nicolas Aspert; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Nicolas Aspert; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SparkWiki toolkit can be used in various scenarios where you are interested in researching Wikipedia graph and pageview statistics. Graph and pageviews can be used and studied separately. The code used to process Wikipedia SQL dumps, along with deployment instructions, are located on GitHub.

To test an example of a pre-processed graph, you can download a dump of the English Wikipedia graph (see attached wikipedia_nrc.dump), which you can directly import into a Neo4J instance. The dump is intended for neo4j version 3.x and can be imported using the following command (make sure you do not have an existing wikipedia.db database as the command below will overwrite its content):

sudo -u neo4j neo4j-admin load --force --from=wikipedia_nrc.dump --database=wikipedia.db

If you try to import it into Neo4J version 4.x, you need to set the property

dbms.allow_upgrade=true in /etc/neo4j/neo4j.conf

before importing. When you start the neo4j server it will upgrade the database s.t. it is compatible with version 4.x.

Autonomous Delivery Robots Market Analysis North America, Europe, APAC,...

technavio.com

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio, Autonomous Delivery Robots Market Analysis North America, Europe, APAC, Middle East and Africa, South America - US, Germany, France, Canada, UK - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/autonomous-delivery-robots-market-industry-analysis

Explore at:

Dataset provided by

TechNavio

Authors

Technavio

Time period covered

2021 - 2025

Area covered

Global

Description

Snapshot img

Autonomous Delivery Robots Market Size 2024-2028

The autonomous delivery robots market size is forecast to increase by USD 29.22 billion at a CAGR of 22.2% between 2023 and 2028.

Autonomous delivery robots have gained significant traction in various industries, driven by the increase in e-commerce sales and the increasing focus on reducing carbon footprints. The global market for autonomous delivery robots is expected to witness substantial growth, as organizations adopt these technologies to streamline their supply chain operations and enhance customer experience. The market is witnessing significant growth as businesses seek to optimize their logistics operations and enhance customer experience. However, challenges such as malfunctioning of robots due to technical glitches and navigational complexities persist, which need to be addressed to ensure the widespread adoption of these robots. Events like product launches, partnerships, and collaborations continue to shape the market dynamics, with key locations, including urban areas and campuses, witnessing high demand for these robots. Overall, the market is poised for strong growth, driven by these trends and challenges.

What will be the Size of the Market During the Forecast Period?

Request Free Sample

The market is witnessing significant growth as businesses seek to optimize their logistics operations and enhance customer experience. This market is driven by the integration of advanced technologies such as social networks, recommendation engines, graph databases, and visualization in the development of these robots. Social networks are playing a crucial role in the market by enabling real-time communication between logistics professionals and their customers. Recommendation engines are used to analyze customer data and suggest optimal delivery routes, ensuring timely and accurate deliveries. Graph databases are another key technology that is transforming the market.
Furthermore, with the property graph model, these databases allow for efficient data modeling and management, enabling quick identification of vertices, edges, labels, and indexes. This is essential for long tasks such as route optimization and warehouse management. Data centers and cloud regions are also critical components of the market. Real-time analytics and stored procedures are used to process large volumes of data and provide insights into business processes. This information is crucial for optimizing delivery routes and managing inventory levels. Despite the numerous benefits, the market faces challenges due to the lack of standardization. Programming ease and the need for customization are major concerns for logistics professionals.

How is this market segmented and which is the largest segment?

Type

  Semi-autonomous
  Fully autonomous


Geography

  North America

    Canada
    US


  Europe

    Germany
    UK
    France


  APAC



  Middle East and Africa



  South America

By Type Insights

The semi-autonomous segment is estimated to witness significant growth during the forecast period.

Semi-autonomous delivery robots are revolutionizing the logistics industry by offering a blend of automation and human intervention. These robots, which can be controlled remotely from a central command center, require minimal human input to complete tasks. Companies like Postmates and Marble are leading the charge in this space. The user-friendliness of these robots is a significant advantage, with touchscreens and additional functions such as a help button for customers. Human intervention ensures safety and reliability, making these robots a preferred choice over fully autonomous counterparts. Real-time location monitoring allows for remote path adjustments, ensuring efficient delivery. As the logistics sector continues to evolve, the adoption of semi-autonomous delivery robots is poised to increase, offering benefits in data management, medical information, and disease surveillance.

Get a glance at the market report of share of various segments Request Free Sample

The semi-autonomous segment was valued at USD 8.85 billion in 2018 and showed a gradual increase during the forecast period.

Regional Analysis

North America is estimated to contribute 44% to the growth of the global market during the forecast period.

Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.

For more insights on the market share of various regions Request Free Sample

In the market, North America holds a significant portion due to escalating last-mile delivery expenses and the expanding util

u
KGCW 2023 Challenge @ ESWC 2023
investigacion.usc.es
investigacion.usc.gal
+1more
Updated 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia; Şimşek, Umutcan; Iglesias, Ana; Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia; Şimşek, Umutcan; Iglesias, Ana (2023). KGCW 2023 Challenge @ ESWC 2023 [Dataset]. https://investigacion.usc.es/documentos/668fc445b9e7c03b01bd84ec?lang=en
Explore at:
Dataset updated
2023
Authors
Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia; Şimşek, Umutcan; Iglesias, Ana; Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia; Şimşek, Umutcan; Iglesias, Ana
Description
Knowledge Graph Construction Workshop 2023: challenge Knowledge graph construction of heterogeneous data has seen a lot of uptake
in the last decade from compliance to performance optimizations with respect
to execution time. Besides execution time as a metric for comparing knowledge
graph construction, other metrics e.g. CPU or memory usage are not considered.
This challenge aims at benchmarking systems to find which RDF graph
construction system optimizes for metrics e.g. execution time, CPU,
memory usage, or a combination of these metrics. Task description The task is to reduce and report the execution time and computing resources
(CPU and memory usage) for the parameters listed in this challenge, compared
to the state-of-the-art of the existing tools and the baseline results provided
by this challenge. This challenge is not limited to execution times to create
the fastest pipeline, but also computing resources to achieve the most efficient
pipeline. We provide a tool which can execute such pipelines end-to-end. This tool also
collects and aggregates the metrics such as execution time, CPU and memory
usage, necessary for this challenge as CSV files. Moreover, the information
about the hardware used during the execution of the pipeline is available as
well to allow fairly comparing different pipelines. Your pipeline should consist
of Docker images which can be executed on Linux to run the tool. The tool is
already tested with existing systems, relational databases e.g. MySQL and
PostgreSQL, and triplestores e.g. Apache Jena Fuseki and OpenLink Virtuoso
which can be combined in any configuration. It is strongly encouraged to use
this tool for participating in this challenge. If you prefer to use a different
tool or our tool imposes technical requirements you cannot solve, please contact
us directly. Part 1: Knowledge Graph Construction Parameters These parameters are evaluated using synthetic generated data to have more
insights of their influence on the pipeline. Data Number of data records: scaling the data size vertically by the number of records with a fixed number of data properties (10K, 100K, 1M, 10M records). Number of data properties: scaling the data size horizontally by the number of data properties with a fixed number of data records (1, 10, 20, 30 columns). Number of duplicate values: scaling the number of duplicate values in the dataset (0%, 25%, 50%, 75%, 100%). Number of empty values: scaling the number of empty values in the dataset (0%, 25%, 50%, 75%, 100%). Number of input files: scaling the number of datasets (1, 5, 10, 15). Mappings Number of subjects: scaling the number of subjects with a fixed number of predicates and objects (1, 10, 20, 30 TMs). Number of predicates and objects: scaling the number of predicates and objects with a fixed number of subjects (1, 10, 20, 30 POMs). Number of and type of joins: scaling the number of joins and type of joins (1-1, N-1, 1-N, N-M) Part 2: GTFS-Madrid-Bench The GTFS-Madrid-Bench provides insights in the pipeline with real data from the
public transport domain in Madrid. Scaling GTFS-1 SQL GTFS-10 SQL GTFS-100 SQL GTFS-1000 SQL Heterogeneity GTFS-100 XML + JSON GTFS-100 CSV + XML GTFS-100 CSV + JSON GTFS-100 SQL + XML + JSON + CSV Example pipeline The ground truth dataset and baseline results are generated in different steps
for each parameter: The provided CSV files and SQL schema are loaded into a MySQL relational database. Mappings are executed by accessing the MySQL relational database to construct a knowledge graph in N-Triples as RDF format. The constructed knowledge graph is loaded into a Virtuoso triplestore, tuned according to the Virtuoso documentation. The provided SPARQL queries are executed on the SPARQL endpoint exposed by Virtuoso. The pipeline is executed 5 times from which the median execution time of each
step is calculated and reported. Each step with the median execution time is
then reported in the baseline results with all its measured metrics.
Query timeout is set to 1 hour and knowledge graph construction timeout
to 24 hours. The execution is performed with the following tool
Data from: KGCW 2024 Challenge @ ESWC 2024
zenodo.org
investigacion.usc.gal
+1more
application/gzip
Updated Apr 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Serles; Umutcan Serles; Ana Iglesias; Ana Iglesias (2024). KGCW 2024 Challenge @ ESWC 2024 [Dataset]. http://doi.org/10.5281/zenodo.10973433
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10973433
Dataset updated
Apr 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Serles; Umutcan Serles; Ana Iglesias; Ana Iglesias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Knowledge Graph Construction Workshop 2024: challenge

Knowledge graph construction of heterogeneous data has seen a lot of uptake
in the last decade from compliance to performance optimizations with respect
to execution time. Besides execution time as a metric for comparing knowledge
graph construction, other metrics e.g. CPU or memory usage are not considered.
This challenge aims at benchmarking systems to find which RDF graph
construction system optimizes for metrics e.g. execution time, CPU,
memory usage, or a combination of these metrics.

Task description

The task is to reduce and report the execution time and computing resources
(CPU and memory usage) for the parameters listed in this challenge, compared
to the state-of-the-art of the existing tools and the baseline results provided
by this challenge. This challenge is not limited to execution times to create
the fastest pipeline, but also computing resources to achieve the most efficient
pipeline.

We provide a tool which can execute such pipelines end-to-end. This tool also
collects and aggregates the metrics such as execution time, CPU and memory
usage, necessary for this challenge as CSV files. Moreover, the information
about the hardware used during the execution of the pipeline is available as
well to allow fairly comparing different pipelines. Your pipeline should consist
of Docker images which can be executed on Linux to run the tool. The tool is
already tested with existing systems, relational databases e.g. MySQL and
PostgreSQL, and triplestores e.g. Apache Jena Fuseki and OpenLink Virtuoso
which can be combined in any configuration. It is strongly encouraged to use
this tool for participating in this challenge. If you prefer to use a different
tool or our tool imposes technical requirements you cannot solve, please contact
us directly.

Track 1: Conformance

The set of new specification for the RDF Mapping Language (RML) established by the W3C Community Group on Knowledge Graph Construction provide a set of test-cases for each module:

RML-Core

RML-IO

RML-CC

RML-FNML

RML-Star

These test-cases are evaluated in this Track of the Challenge to determine their feasibility, correctness, etc. by applying them in implementations. This Track is in Beta status because these new specifications have not seen any implementation yet, thus it may contain bugs and issues. If you find problems with the mappings, output, etc. please report them to the corresponding repository of each module.

Note: validating the output of the RML Star module automatically through the provided tooling is currently not possible, see https://github.com/kg-construct/challenge-tool/issues/1.

Through this Track we aim to spark development of implementations for the new specifications and improve the test-cases. Let us know your problems with the test-cases and we will try to find a solution.

Track 2: Performance

Part 1: Knowledge Graph Construction Parameters

These parameters are evaluated using synthetic generated data to have more
insights of their influence on the pipeline.

Data

Number of data records: scaling the data size vertically by the number of records with a fixed number of data properties (10K, 100K, 1M, 10M records).

Number of data properties: scaling the data size horizontally by the number of data properties with a fixed number of data records (1, 10, 20, 30 columns).

Number of duplicate values: scaling the number of duplicate values in the dataset (0%, 25%, 50%, 75%, 100%).

Number of empty values: scaling the number of empty values in the dataset (0%, 25%, 50%, 75%, 100%).

Number of input files: scaling the number of datasets (1, 5, 10, 15).

Mappings

Number of subjects: scaling the number of subjects with a fixed number of predicates and objects (1, 10, 20, 30 TMs).

Number of predicates and objects: scaling the number of predicates and objects with a fixed number of subjects (1, 10, 20, 30 POMs).

Number of and type of joins: scaling the number of joins and type of joins (1-1, N-1, 1-N, N-M)

Part 2: GTFS-Madrid-Bench

The GTFS-Madrid-Bench provides insights in the pipeline with real data from the
public transport domain in Madrid.

Scaling

GTFS-1 SQL

GTFS-10 SQL

GTFS-100 SQL

GTFS-1000 SQL

Heterogeneity

GTFS-100 XML + JSON

GTFS-100 CSV + XML

GTFS-100 CSV + JSON

GTFS-100 SQL + XML + JSON + CSV

Example pipeline

The ground truth dataset and baseline results are generated in different steps
for each parameter:

The provided CSV files and SQL schema are loaded into a MySQL relational database.

Mappings are executed by accessing the MySQL relational database to construct a knowledge graph in N-Triples as RDF format

The pipeline is executed 5 times from which the median execution time of each
step is calculated and reported. Each step with the median execution time is
then reported in the baseline results with all its measured metrics.
Knowledge graph construction timeout is set to 24 hours.
The execution is performed with the following tool: https://github.com/kg-construct/challenge-tool,
you can adapt the execution plans for this example pipeline to your own needs.

Each parameter has its own directory in the ground truth dataset with the
following files:

Input dataset as CSV.

Mapping file as RML.

Execution plan for the pipeline in metadata.json.

Datasets

Knowledge Graph Construction Parameters

The dataset consists of:

Input dataset as CSV for each parameter.

Mapping file as RML for each parameter.

Baseline results for each parameter with the example pipeline.

Ground truth dataset for each parameter generated with the example pipeline.

Format

All input datasets are provided as CSV, depending on the parameter that is being
evaluated, the number of rows and columns may differ. The first row is always
the header of the CSV.

GTFS-Madrid-Bench

The dataset consists of:

Input dataset as CSV with SQL schema for the scaling and a combination of XML,

CSV, and JSON is provided for the heterogeneity.

Mapping file as RML for both scaling and heterogeneity.

SPARQL queries to retrieve the results.

Baseline results with the example pipeline.

Ground truth dataset generated with the example pipeline.

Format

CSV datasets always have a header as their first row.
JSON and XML datasets have their own schema.

Evaluation criteria

Submissions must evaluate the following metrics:

Execution time of all the steps in the pipeline. The execution time of a step is the difference between the begin and end time of a step.

CPU time as the time spent in the CPU for all steps of the pipeline. The CPU time of a step is the difference between the begin and end CPU time of a step.

Minimal and maximal memory consumption for each step of the pipeline. The minimal and maximal memory consumption of a step is the minimum and maximum calculated of the memory consumption during the execution of a step.

Expected output

Duplicate values

Scale Number of Triples
0 percent 2000000 triples
25 percent 1500020 triples
50 percent 1000020 triples
75 percent 500020 triples
100 percent 20 triples

Empty values

Scale Number of Triples
0 percent 2000000 triples
25 percent 1500000 triples
50 percent 1000000 triples
75 percent 500000 triples
100 percent 0 triples

Mappings

Scale Number of Triples
1TM + 15POM 1500000 triples
3TM + 5POM 1500000 triples
5TM + 3POM 1500000 triples
15TM + 1POM 1500000 triples

Properties

Scale Number of Triples
1M rows 1 column 1000000 triples
1M rows 10
m
Database for design of solar cell active layer through genetic algorithm
data.mendeley.com
Updated Jul 26, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caine Ardayfio (2019). Database for design of solar cell active layer through genetic algorithm [Dataset]. http://doi.org/10.17632/rvdnt639c2.2
Explore at:
Unique identifier
https://doi.org/10.17632/rvdnt639c2.2
Dataset updated
Jul 26, 2019
Authors
Caine Ardayfio
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Microstructure design is a crucial part of developing organic solar cells. Organic solar cells have the potential to become ubiquitous amongst power generation due to their inexpensiveness and ease of fabrication. Although achievements in the chemical properties of the solar cells have been achieved in recent years, a lack of progress in morphology has greatly inhibited organic solar cell adoption. In this, it is illustrated how high-performance microstructures can be developed rapidly via a graph-based strategy. This is in stark contrast to the trial-and-error methods currently employed for organic solar cell microstructure optimization. Treating the microstructure of a material system as graphs allows modular and extensible models that are simple to query and evaluate. The graph surrogate model quickly maps the microstructures properties and integrates well with optimization algorithms while elegantly integrating prior domain knowledge into the microstructure design process. This use of graph-based modeling and probabilistic optimization results in a microstructure design with a 40.29% higher efficiency than conventional solar designs. Fractal analysis was also used to further prove the validity of the designed morphologies. This was accomplished through analyzing models analogous to the function of the solar cell and comparing their similarity with the designed fractal structure. To conclude, graph-based probabilistic optimization led to the identification of a class of microstructures that feature significantly higher efficiencies than currently leading solar cells. It is anticipated that coupling this method with fractal analysis techniques will be widespread for use in optimizing material morphologies. The following dataset includes all code used in microstructure design and fractal analysis, specifically: the creation of a weighted, undirected graph representing the microstructure configuration of chemicals in the solar cell; the approximation of solar cell efficiency through graph-based querying; the optimization of the system through a probabilistic genetic algorithm; and fractal dimension calculator.
P
Pubmed Dataset
paperswithcode.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prithviraj Sen; Galileo Namata; Mustafa Bilgic; Lise Getoor; Brian Gallagher; Tina Eliassi-Rad, Pubmed Dataset [Dataset]. https://paperswithcode.com/dataset/pubmed
Explore at:
Authors
Prithviraj Sen; Galileo Namata; Mustafa Bilgic; Lise Getoor; Brian Gallagher; Tina Eliassi-Rad
Description
The PubMed dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. The citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word vector from a dictionary which consists of 500 unique words.
f
SECOM: A Novel Hash Seed and Community Detection Based-Approach for...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ming Fan; Ka-Chun Wong; Taewoo Ryu; Timothy Ravasi; Xin Gao (2023). SECOM: A Novel Hash Seed and Community Detection Based-Approach for Genome-Scale Protein Domain Identification [Dataset]. http://doi.org/10.1371/journal.pone.0039475
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0039475
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Ming Fan; Ka-Chun Wong; Taewoo Ryu; Timothy Ravasi; Xin Gao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx.

Scale	Number of Triples
0 percent	2000000 triples
25 percent	1500020 triples
50 percent	1000020 triples
75 percent	500020 triples
100 percent	20 triples

Scale	Number of Triples
0 percent	2000000 triples
25 percent	1500000 triples
50 percent	1000000 triples
75 percent	500000 triples
100 percent	0 triples

Scale	Number of Triples
1TM + 15POM	1500000 triples
3TM + 5POM	1500000 triples
5TM + 3POM	1500000 triples
15TM + 1POM	1500000 triples

Scale	Number of Triples
1M rows 1 column	1000000 triples
1M rows 10

Facebook

Twitter

Click to copy link

Link copied

Cite

Cognitive Market Research, The global Graph Analytics market size is USD 2522 million in 2024 and will expand at a compound annual growth rate (CAGR) of 34.0% from 2024 to 2031. [Dataset]. https://www.cognitivemarketresearch.com/graph-analytics-market-report

The global Graph Analytics market size is USD 2522 million in 2024 and will expand at a compound annual growth rate (CAGR) of 34.0% from 2024 to 2031.

Explore at:

pdf,excel,csv,pptAvailable download formats

Dataset authored and provided by

Cognitive Market Research

License

https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

Time period covered

2021 - 2033

Area covered

Global

Description

According to Cognitive Market Research, the global Graph Analytics market size will be USD 2522 million in 2024 and will expand at a compound annual growth rate (CAGR) of 34.0% from 2024 to 2031. Market Dynamics of Graph Analytics Market

Key Drivers for Graph Analytics Market

Increasing Recognition of the Advantages of Graph Databases- One of the main reasons for the Graph Analytics market is the increasing recognition of the advantages of graph databases. Unlike traditional relational databases, graph databases excel at handling complex relationships and interconnected data, making them ideal for use cases such as fraud detection, recommendation engines, and social network analysis. Businesses are leveraging these capabilities to uncover insights and patterns that were previously difficult to detect. The rise of big data and the need for real-time analytics are further driving the adoption of graph databases, as they offer enhanced performance and scalability for large-scale data sets. Additionally, advancements in artificial intelligence and machine learning are amplifying the value of graph databases, enabling more sophisticated data modeling and predictive analytics.
Growing Uptake of Big Data Tools to Drive the Graph Analytics Market's Expansion in the Years Ahead.

Key Restraints for Graph Analytics Market

Limited Awareness and Understanding pose a serious threat to the Graph Analytics industry.
The market also faces significant difficulties related to data security and privacy.

Introduction of the Graph Analytics Market

The Graph Analytics Market is rapidly expanding, driven by the growing need for advanced data analysis techniques in various sectors. Graph analytics leverages graph structures to represent and analyze relationships and dependencies, providing deeper insights than traditional data analysis methods. Key factors propelling this market include the rise of big data, the increasing adoption of artificial intelligence and machine learning, and the demand for real-time data processing. Industries such as finance, healthcare, telecommunications, and retail are major contributors, utilizing graph analytics for fraud detection, personalized recommendations, network optimization, and more. Leading vendors are continually innovating to offer scalable, efficient solutions, incorporating advanced features like graph databases and visualization tools.

Clear search

Close search

Google apps

Main menu

The global Graph Analytics market size is USD 2522 million in 2024 and will...

Event Data and Semantic Header for OCED-PG

Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL

DataSheet2_Threat modelling in Internet of Things (IoT) environments using...

Event Graph of BPI Challenge 2019

Meshes

MIVIA ARG Dataset

Alesco Phone ID Database - Identity Graph Data with over 860 Million Phone...

DataSheet_2_NG-Tax 2.0: A Semantic Framework for High-Throughput Amplicon...

Most popular database management systems worldwide 2024

TESTAR State Model extracted while executing MyThaiStar as web system under...

Socially Stigmatized Diseases Market Analysis North America, Europe, Asia,...

Snapshot img

Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...

SparkWiki: Wikipedia graph dataset and pagecounts pre-processing tools

Autonomous Delivery Robots Market Analysis North America, Europe, APAC,...

Snapshot img

KGCW 2023 Challenge @ ESWC 2023

Data from: KGCW 2024 Challenge @ ESWC 2024

Knowledge Graph Construction Workshop 2024: challenge

Track 1: Conformance

Track 2: Performance

Database for design of solar cell active layer through genetic algorithm

Pubmed Dataset

SECOM: A Novel Hash Seed and Community Detection Based-Approach for...

The global Graph Analytics market size is USD 2522 million in 2024 and will expand at a compound annual growth rate (CAGR) of 34.0% from 2024 to 2031.