Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For K4 and Km-e graphs, a coloring type (K4,Km-e;n) is such an edge coloring of the full Kn graph, which does not have the K4 subgraph in the first color (representing by no edges in the graph) or the Km-e subgraph in the second color (representing by edges in the graph). Km-e means the full Km graph with one edge removed.The Ramsey number R(K4,Km-e) is the smallest natural number n such that for any edge coloring of the full Kn graph there is an isomorphic subgraph with K4 in the first color (no edge in the graph) or isomorphic with Km-e in the second color (exists edge in the graph). Coloring types (K4,Km-e;n) exist for n<R(K4,Km-e).The dataset consists of:a) 5 files containing all non-isomorphic graphs that are coloring types (K4,K3-e;n) for 1<n<7,b) 9 files containing all non-isomorphic graphs that are coloring types (K4,K4-e;n) for 1<n<11.
Facebook
TwitterThese data were used to examine grammatical structures and patterns within a set of geospatial glossary definitions. Objectives of our study were to analyze the semantic structure of input definitions, use this information to build triple structures of RDF graph data, upload our lexicon to a knowledge graph software, and perform SPARQL queries on the data. Upon completion of this study, SPARQL queries were proven to effectively convey graph triples which displayed semantic significance. These data represent and characterize the lexicon of our input text which are used to form graph triples. These data were collected in 2024 by passing text through multiple Python programs utilizing spaCy (a natural language processing library) and its pre-trained English transformer pipeline. Before data was processed by the Python programs, input definitions were first rewritten as natural language and formatted as tabular data. Passages were then tokenized and characterized by their part-of-speech, tag, dependency relation, dependency head, and lemma. Each word within the lexicon was tokenized. A stop-words list was utilized only to remove punctuation and symbols from the text, excluding hyphenated words (ex. bowl-shaped) which remained as such. The tokens’ lemmas were then aggregated and totaled to find their recurrences within the lexicon. This procedure was repeated for tokenizing noun chunks using the same glossary definitions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistics of datasets used in the experiments.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Anshuman_Tiwari2005
Released under CC0: Public Domain
Facebook
TwitterDRAKO is a leader in providing Device Graph Data, focusing on understanding the relationships between consumer devices and identities. Our data allows businesses to create holistic profiles of users, track engagement across platforms, and measure the effectiveness of advertising efforts.
Device Graph Data is essential for accurate audience targeting, cross-device attribution, and understanding consumer journeys. By integrating data from multiple sources, we provide a unified view of user interactions, helping businesses make informed decisions.
Key Features: - Comprehensive device mapping to understand user behaviour across multiple platforms - Detailed Identity Graph Data for cross-device identification and engagement tracking - Integration with Connected TV Data for enhanced insights into video consumption habits - Mobile Attribution Data to measure the effectiveness of mobile campaigns - Customizable analytics to segment audiences based on device usage and demographics - Some ID types offered: AAID, idfa, Unified ID 2.0, AFAI, MSAI, RIDA, AAID_CTV, IDFA_CTV
Use Cases: - Cross-device marketing strategies - Attribution modelling and campaign performance measurement - Audience segmentation and targeting - Enhanced insights for Connected TV advertising - Comprehensive consumer journey mapping
Data Compliance: All of our Device Graph Data is sourced responsibly and adheres to industry standards for data privacy and protection. We ensure that user identities are handled with care, providing insights without compromising individual privacy.
Data Quality: DRAKO employs robust validation techniques to ensure the accuracy and reliability of our Device Graph Data. Our quality assurance processes include continuous monitoring and updates to maintain data integrity and relevance.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For K6-e and Km-e graphs, the type coloring (K6-e,Km-e;n) is such an edge coloring of the full Kn graph, which does not have the K6-e subgraph in the first color (no edge in the graph) or the Km-e subgraph in the second color (exists edge in the graph). Km-e means the full Km graph with one edge removed. The Ramsey number R(K6-e,Km-e) is the smallest natural number n such that for any edge coloring of the full Kn graph there is an isomorphic subgraph with K6-e in the first color (no edge in the graph) or isomorphic with Km-e in the second color (exists edge in the graph). Coloring types (K6-e,Km-e;n) exist for n<R(K6-e,Km-e).
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Brayan Alejandro Valencia lopez
Released under Apache 2.0
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SeaLiT Knowledge Graphs is an RDF dataset of maritime history data that has been transcribed (and then transformed) from original archival sources in the context of the SeaLiT Project (Seafaring Lives in Transition, Mediterranean Maritime Labour and Shipping, 1850s-1920s). The underlying data model is the SeaLiT Ontology, an extension of the ISO standard CIDOC-CRM (ISO 21127:2014) for the modelling and integration of maritime history information.
The knowledge graphs integrate data of totally 16 different types of archival sources:
More information about the archival sources are available through the SeaLiT website. Data exploration applications over these sources are also publicly available (SeaLiT Catalogues, SeaLiT ResearchSpace).
Data from these archival sources has been transcribed in tabular form and then curated by historians of SeaLiT using the FAST CAT system. The transcripts (records), together with the curated vocabulary terms and entity instances (ships, persons, locations, organizations), are then transformed to RDF using the SeaLiT Ontology as the target (domain) model. To this end, the corresponding schema mappings between the original schemata and the ontology were defined using the X3ML mapping definition language, that were subsequently used for delivering the RDF datasets.
More information about the FAST CAT system and the data transcription, curation and transformation processes can be found in the following paper:
P. Fafalios, K. Petrakis, G. Samaritakis, K. Doerr, A. Kritsotaki, Y. Tzitzikas, M. Doerr, "FAST CAT: Collaborative Data Entry and Curation for Semantic Interoperability in Digital Humanities", ACM Journal on Computing and Cultural Heritage, 2021. https://doi.org/10.1145/3461460 [pdf, bib]
The RDF dataset is provided as a set of TriG files per record per archival source. For each record, the dataset provides: i) one trig file for the record's data (records.trig), ii) one trig file for the record's (curated) vocabulary terms (vocabularies.trig), and iii) four trig files for the record's (curated) entity instances (ships.trig, persons.trig, persons.trig, organizations.trig).
We also provide the RDFS files of the used ontologies (SeaLiT Ontology verson 1.0, CIDOC-CRM version 7.1.1).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
One table and 11 figures. Table 1 shows XLORE2 statistics. Figure 1 shows the framework of XLORE2. Figure 2 is an example of cross-lingual knowledge linking. Figure 3 presents the framework of cross-lingual knowledge linking. Figure 4 is an example of cross-lingual property matching (attribute matching). Figure 5 shows the framework of cross-lingual property matching. Figure 6 presents an example of mistakenly derived facts. Figure 7 is the framework of cross-lingual knowledge validation. Figure 8 shows an example of fine-grained type inference. Figure 9 depicts the framework of fine-grained type inference. Figure 10 is an illustration of XLink. Figure 11 shows the interface of XLORE2 and XLink.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Hungary - Distribution of population by household types: Single person was 13.80% in December of 2024, according to the EUROSTAT. Trading Economics provides the current actual value, an historical data chart and related indicators for Hungary - Distribution of population by household types: Single person - last updated from the EUROSTAT on November of 2025. Historically, Hungary - Distribution of population by household types: Single person reached a record high of 14.50% in December of 2017 and a record low of 9.20% in December of 2010.
Facebook
TwitterThis dataset was created by Thida Khim
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Currently, in the field of chart datasets, most existing resources are mainly in English, and there are almost no open-source Chinese chart datasets, which brings certain limitations to research and applications related to Chinese charts. This dataset draws on the construction method of the DVQA dataset to create a chart dataset focused on the Chinese environment. To ensure the authenticity and practicality of the dataset, we first referred to the authoritative website of the National Bureau of Statistics and selected 24 widely used data label categories in practical applications, totaling 262 specific labels. These tag categories cover multiple important areas such as socio-economic, demographic, and industrial development. In addition, in order to further enhance the diversity and practicality of the dataset, this paper sets 10 different numerical dimensions. These numerical dimensions not only provide a rich range of values, but also include multiple types of values, which can simulate various data distributions and changes that may be encountered in real application scenarios. This dataset has carefully designed various types of Chinese bar charts to cover various situations that may be encountered in practical applications. Specifically, the dataset not only includes conventional vertical and horizontal bar charts, but also introduces more challenging stacked bar charts to test the performance of the method on charts of different complexities. In addition, to further increase the diversity and practicality of the dataset, the text sets diverse attribute labels for each chart type. These attribute labels include but are not limited to whether they have data labels, whether the text is rotated 45 °, 90 °, etc. The addition of these details makes the dataset more realistic for real-world application scenarios, while also placing higher demands on data extraction methods. In addition to the charts themselves, the dataset also provides corresponding data tables and title text for each chart, which is crucial for understanding the content of the chart and verifying the accuracy of the extracted results. This dataset selects Matplotlib, the most popular and widely used data visualization library in the Python programming language, to be responsible for generating chart images required for research. Matplotlib has become the preferred tool for data scientists and researchers in data visualization tasks due to its rich features, flexible configuration options, and excellent compatibility. By utilizing the Matplotlib library, every detail of the chart can be precisely controlled, from the drawing of data points to the annotation of coordinate axes, from the addition of legends to the setting of titles, ensuring that the generated chart images not only meet the research needs, but also have high readability and attractiveness visually. The dataset consists of 58712 pairs of Chinese bar charts and corresponding data tables, divided into training, validation, and testing sets in a 7:2:1 ratio.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Graphs are a representative type of fundamental data structures. They are capable of representing complex association relationships in diverse domains. For large-scale graph processing, the stream graphs have become efficient tools to process dynamically evolving graph data. When processing stream graphs, the subgraph counting problem is a key technique, which faces significant computational challenges due to its #P-complete nature. This work introduces StreamSC, a novel framework that efficiently estimate subgraph counting results on stream graphs through two key innovations: (i) It’s the first learning-based framework to address the subgraph counting problem focused on stream graphs; and (ii) this framework addresses the challenges from dynamic changes of the data graph caused by the insertion or deletion of edges. Experiments on 5 real-word graphs show the priority of StreamSC on accuracy and efficiency.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Business process event data modeled as labeled property graphs
Data Format
-----------
The dataset comprises one labeled property graph in two different file formats.
#1) Neo4j .dump format
A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/
/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=
The .dump was created with Neo4j v3.5.
#2) .graphml format
A .zip file containing a .graphml file of the entire graph
Data Schema
-----------
The graph is a labeled property graph over business process event data. Each graph uses the following concepts
:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"
:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")
:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node
:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations
:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities
:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.
:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log
:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph
:REL relationship - placeholder for any structural relationship between two :Entity nodes
The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552
Data Contents
-------------
neo4j-bpic15-2021-02-17 (.dump|.graphml.zip)
An integrated graph describing the raw event data of the entire BPI Challenge 2015 dataset.
van Dongen, B.F. (Boudewijn) (2015): BPI Challenge 2015. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:31a308ef-c844-48da-948c-305d167a0ec1
This data is provided by five Dutch municipalities. The data contains all building permit applications over a period of approximately four years. There are many different activities present, denoted by both codes (attribute concept:name) and labels, both in Dutch (attribute taskNameNL) and in English (attribute taskNameEN). The cases in the log contain information on the main application as well as objection procedures in various stages. Furthermore, information is available about the resource that carried out the task and on the cost of the application (attribute SUMleges). The processes in the five municipalities should be identical, but may differ slightly. Especially when changes are made to procedures, rules or regulations the time at which these changes are pushed into the five municipalities may differ. Of course, over the four year period, the underlying processes have changed. The municipalities have a number of questions, namely: What are the roles of the people involved in the various stages of the process and how do these roles differ across municipalities? What are the possible points for improvement on the organizational structure for each of the municipalities? The employees of two of the five municipalities have physically moved into the same location recently. Did this lead to a change in the processes and if so, what is different? Some of the procedures will be outsourced from 2018, i.e. they will be removed from the process and the applicant needs to have these activities performed by an external party before submitting the application. What will be the effect of this on the organizational structures in the five municipalities? Where are differences in throughput times between the municipalities and how can these be explained? What are the differences in control flow between the municipalities? There are five different log files available in this collection. Events are labeled with both a code and a Dutch and English label. Each activity code consists of three parts: two digits, a variable number of characters, and then three digits. The first two digits as well as the characters indicate the subprocess the activity belongs to. For instance ‘01_HOOFD_xxx’ indicates the main process and ‘01_BB_xxx’ indicates the ‘objections and complaints’ (‘Beroep en Bezwaar’ in Dutch) subprocess. The last three digits hint on the order in which activities are executed, where the first digit often indicates a phase within a process. Each trace and each event, contain several data attributes that can be used for various checks and predictions. Furthermore, some employees may have performed tasks for different municipalities, i.e. if the employee number is the same, it is safe to assume the same person is being identified.
The data contains the following entities and their events
- Application - a building permit application handled in one of five Dutch municipalities
- Case_R - a user or worker involved in handling the application
- Responsible_actor - a user or worker designated as responsible actor for an activity
- monitoringResource - a user or worker designated as monitoring resource for an activity
The data contains 5 event log nodes as the data was integrated from 5 different event logs from 5 different systems.
Data Size
---------
BPIC15, nodes: 268851, relationships: 2620418
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sweden - Distribution of population by household types: Single person was 22.20% in December of 2024, according to the EUROSTAT. Trading Economics provides the current actual value, an historical data chart and related indicators for Sweden - Distribution of population by household types: Single person - last updated from the EUROSTAT on December of 2025. Historically, Sweden - Distribution of population by household types: Single person reached a record high of 24.10% in December of 2023 and a record low of 19.80% in December of 2012.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CLARAThis deposit is part of the CLARA project. The CLARA project aims to empower teachers in the task of creating new educational resources. And in particular with the task of handling the licenses of reused educational resources. The present deposit contains the RDF files created using an RDF mapping (RML) and a mapper (Morph-KGC). It also contains the files JSON used as input. The corresponding pipeline can be found on Gitlab. The data used in that pipeline originate from X5GON, a European project aiming to generate and gather open educational resources. Knowledge graph contentThe present Knowledge Graph contains information about 45K Educational Resources (ERs) and 135K subjects (extracted from DBpedia).That information contains
the author, its title and description the license, a URL to the resource itself, the language of the ER, its mimetype, and finally which subject it talks about, and to what extent. That extent is given by two scores: a PageRank score and a Cosinus score. A particularity of the knowledge graph is its heavy use of RDF reification, across large multi-valued properties.Thus four versions of the knowledge graph exist, using Standard reification, Singleton property, Named graphs, and RDF-star. The Knowledge Graph also contains categories originating from DBpedia. They help precise the subjects that are also extracted from DBpedia. The KG.zip files contain five types of files:
Authors_[X].nt - Those contain the authors' nodes, their type, and name. ER_[X].nt/nq/ttl - Those contain the ERs and their information using the respective RDF reification model. categories_skos_[X].ttl - Those contain the hierarchy of DBpedia categories. categories_labels.ttl - This file contains additional information about the categories. categories_article.ttl - This file contains the RDF triples that link the DBpedia subjects to the DBpedia categories.
JSON content The original dataset was cut into multiple JSON files in order to make its processing easier. DBpedia categories were extracted as RDF and aren't present in the JSON files.There are two types of files in the input-json.zip file:
authors_[X].json - Which lists the authors names ER_[X].json - Which lists the ERs and their related information.That information contains:
their title. their description. their language (and language_detected, only the first one is used in the pipeline here). their license. their mimetype. the authors. the date of creation of the resource. a url linking to the resource itself. the subjects (named concepts) associated with the resource. With the corresponding scores.
If you do use this dataset, you can cite this repository:
Kieffer, M., Fakih, G., & Serrano Alvarado, P. (2023). CLARA Knowledge Graph of licensed educational resources [Data set]. Semantics, Leipzig, Germany. Zenodo. https://doi.org/10.5281/zenodo.8403142 Or the corresponding paper
Kieffer, M., Fakih, G. & Serrano-Alvarado, P. (2023). Evaluating Reification with Multi-valued Properties in a Knowledge Graph of Licensed Educational Resources. Semantics, Leipzig, Germany.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
November 2020: Please check out the newer version of the OpenAIRE Research Graph dump available at https://doi.org/10.5281/zenodo.4201546. The newer version contains json files that are more compact and easy to process. learn more about the OpenAIRE Research Graph at https://graph.openaire.eu.
The OpenAIRE Research Graph is exported as several dumps, so you can download the parts you are interested into.
Please go to http://develop.openaire.eu/graph-dumps.html for instructions on how to consume the dumps.
Libraries: this blog describes the openairegraph libraries, which can be used to perform analytics on this dataset.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
oag-cs, oag-eng, oag-chem are new heterogeneous networks composed of subsets of the Open Academic Graph (OAG). Each of the datasets contains papers from three different subject domains -- computer science, engineering, and chemistry. These datasets also contain four types of entities -- papers, authors, institutions, and fields of study. Each paper is associated with a 768-dimensional feature vector generated from a pre-trained XLNet applying on the paper titles. The representation of each word in the title are weighted by each word's attention to get the title representation for each paper. Each paper node is labeled with its published venue (paper or conference). We split the papers published up to 2016 as the training set, papers published in 2017 as the validation set, and papers published in 2018 and 2019 as the test set. The publication year of each paper is also included in these datasets. This means those datasets can also be converted to use the publication year as class labels.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction: As the evaluation indices, cancer grading and subtyping have diverse clinical, pathological, and molecular characteristics with prognostic and therapeutic implications. Although researchers have begun to study cancer differentiation and subtype prediction, most of relevant methods are based on traditional machine learning and rely on single omics data. It is necessary to explore a deep learning algorithm that integrates multi-omics data to achieve classification prediction of cancer differentiation and subtypes.Methods: This paper proposes a multi-omics data fusion algorithm based on a multi-view graph neural network (MVGNN) for predicting cancer differentiation and subtype classification. The model framework consists of a graph convolutional network (GCN) module for learning features from different omics data and an attention module for integrating multi-omics data. Three different types of omics data are used. For each type of omics data, feature selection is performed using methods such as the chi-square test and minimum redundancy maximum relevance (mRMR). Weighted patient similarity networks are constructed based on the selected omics features, and GCN is trained using omics features and corresponding similarity networks. Finally, an attention module integrates different types of omics features and performs the final cancer classification prediction.Results: To validate the cancer classification predictive performance of the MVGNN model, we conducted experimental comparisons with traditional machine learning models and currently popular methods based on integrating multi-omics data using 5-fold cross-validation. Additionally, we performed comparative experiments on cancer differentiation and its subtypes based on single omics data, two omics data, and three omics data.Discussion: This paper proposed the MVGNN model and it performed well in cancer classification prediction based on multiple omics data.
Facebook
Twitterhttps://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
NoSQL Database Market size was valued at USD 6.47 Billion in 2024 and is expected to reach USD 44.66 Billion by 2032, growing at a CAGR of 30.14% from 2026 to 2032.Global NoSQL Database Market DriversExponential Growth of Big Data and IoT: The explosion of Big Data and Internet of Things (IoT) applications is a primary catalyst for NoSQL adoption, requiring database solutions that can ingest and process colossal volumes of unstructured and semi-structured data from diverse sources like sensors, social media, and web logs. Unlike rigid relational systems, Increasing Demand for Real-Time Web and Mobile Applications: The surging demand for real-time web and mobile applications is significantly fueling the NoSQL market, as these modern applications require sub-millisecond latency and exceptionally high throughput to deliver a seamless user experience. NoSQL database types, particularly key-value stores and document databases, are architecturally optimized for rapid read/write operations and horizontal scaling,.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For K4 and Km-e graphs, a coloring type (K4,Km-e;n) is such an edge coloring of the full Kn graph, which does not have the K4 subgraph in the first color (representing by no edges in the graph) or the Km-e subgraph in the second color (representing by edges in the graph). Km-e means the full Km graph with one edge removed.The Ramsey number R(K4,Km-e) is the smallest natural number n such that for any edge coloring of the full Kn graph there is an isomorphic subgraph with K4 in the first color (no edge in the graph) or isomorphic with Km-e in the second color (exists edge in the graph). Coloring types (K4,Km-e;n) exist for n<R(K4,Km-e).The dataset consists of:a) 5 files containing all non-isomorphic graphs that are coloring types (K4,K3-e;n) for 1<n<7,b) 9 files containing all non-isomorphic graphs that are coloring types (K4,K4-e;n) for 1<n<11.