Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Patient-drug-disease (PDD) Graph dataset, utilising Electronic medical records (EMRS) and biomedical Knowledge graphs. The novel framework to construct the PDD graph is described in the associated publication.PDD is an RDF graph consisting of PDD facts, where a PDD fact is represented by an RDF triple to indicate that a patient takes a drug or a patient is diagnosed with a disease. For instance, (pdd:274671, pdd:diagnosed, sepsis)Data files are in .nt N-Triple format, a line-based syntax for an RDF graph. These can be accessed via openly-available text edit software.diagnose_icd_information.nt - contains RDF triples mapping patients to diagnoses. For example:(pdd:18740, pdd:diagnosed, icd99592),where pdd:18740 is a patient entity, and icd99592 is the ICD-9 code of sepsis.drug_patients.nt- contains RDF triples mapping patients to drugs. For example:(pdd:18740, pdd:prescribed, aspirin),where pdd:18740 is a patient entity, and aspirin is the drug's name.Background:Electronic medical records contain multi-format electronic medical data that consist of an abundance of medical knowledge. Faced with patients' symptoms, experienced caregivers make the right medical decisions based on their professional knowledge, which accurately grasps relationships between symptoms, diagnoses and corresponding treatments. In the associated paper, we aim to capture these relationships by constructing a large and high-quality heterogenous graph linking patients, diseases, and drugs (PDD) in EMRs. Specifically, we propose a novel framework to extract important medical entities from MIMIC-III (Medical Information Mart for Intensive Care III) and automatically link them with the existing biomedical knowledge graphs, including ICD-9 ontology and DrugBank. The PDD graph presented in this paper is accessible on the Web via the SPARQL endpoint as well as in .nt format in this repository, and provides a pathway for medical discovery and applications, such as effective treatment recommendations.De-identificationIt is necessary to mention that MIMIC-III contains clinical information of patients. Although the protected health information was de-identifed, researchers who seek to use more clinical data should complete an on-line training course and then apply for the permission to download the complete MIMIC-III dataset: https://mimic.physionet.org/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Germany Exports of maps, hydrographic or similar charts (printed) to Moldova was US$1.2 Thousand during 2015, according to the United Nations COMTRADE database on international trade. Germany Exports of maps, hydrographic or similar charts (printed) to Moldova - data, historical chart and statistics - was last updated on July of 2025.
According to INSPIRE transformed development plan “Ortsbaulinieplan Harthausen” of the municipality of Epfendorf based on an XPlanung dataset in version 5.0.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The datasets illustrate the dynamics for the discrete-time predator-prey model when the predator has two stages in its lifetime, namely, juvenile and adult. Figure-1 (datasets associated: fig-1-a-ex-2.txt, fig-1-a-ex-2.txt, and fig-1-a-ex-2.txt) in the associated manuscript (Ackleh et al., 2020) gives the time-series dynamics of the various outcomes (extinction, predator-free, coexistence states) of the model. These situations are described by example-2 in the manuscript. In addition, example-3 illustrates the qualitative change (bifurcation diagrams) in the dynamics of the prey and two stages of the predator densities as a function of adult predator survival probability. The outcome of these dynamics are given in figure-2 (datasets associated: fig-2-abc-ex-3.txt, fig-2-def-ex-3.txt, fig-2-ghi-ex-3.txt, and fig-2-jkl-ex-3.txt) in the manuscript which shows that the model may exhibit chaotic dynamics when the predator has stage-structure in the model while the unstructured predator and prey model did not show such behaviors in the model outcomes. This dataset supports the publication: Ackleh, A. S., Hossain, M. I., Veprauskas, A., & Zhang, A. (2020). Persistence of a Discrete-Time Predator-Prey Model with Stage-Structure in the Predator. Springer Proceedings in Mathematics & Statistics, 145–163. doi:10.1007/978-3-030-60107-2_6.
Graffiti is an urban phenomenon that is increasingly attracting the interest of the sciences. To the best of our knowledge, no suitable data corpora are available for systematic research until now. The Information System Graffiti in Germany project (Ingrid) closes this gap by dealing with graffiti image collections that have been made available to the project for public use. Within Ingrid, the graffiti images are collected, digitized and annotated. With this work, we aim to support the rapid access to a comprehensive data source on Ingrid targeted especially by researchers. In particular, we present IngridKG, an RDF knowledge graph of annotated graffiti, abides by the Linked Data and FAIR principles. We weekly update IngridKG by augmenting the new annotated graffiti to our knowledge graph. Our generation pipeline applies RDF data conversion, link discovery and data fusion approaches to the original data. The current version of IngridKG contains 460,640,154 triples and is linked to 3 other knowledge graphs by over 200,000 links. In our use case studies, we demonstrate the usefulness of our knowledge graph for different applications.
Salmonella pangenome graph and variant call data for 539,283 genomesDescription:Salmonella enterica causes human disease and decreases agricultural production. The overall goals of this project is to generate a large database of S. enterica variants with 539,283 samples and 236,069 features for applications in machine learning and genomics. We transformed single nucleotide polymorphism (SNP) data into reduced dimensional representations which are tolerant of missing data based on disentangled variational autoencoders. TFRecord files were made with custom Python scripts that parsed the variant call formats (VCF) into sparse tensors and combined them with the Salmonella In Silico Typing Resource (SISTR) serotype data.The data directory contains:The tar file of TFRecords: tfrecords.tar (103 GB). The TFRecords are organized first by how they were genotyped. mpileup records were created with Mpileup, and the gvg records were created with graph variant calling. In each of these directories batches of ~10,000 sequence reads named Sra10k_XX.tfrecord.gz (00--54). File Sra10k_99.tfrecord.gz contains incomplete SRAs. Each TFRecord contains the shape of the tensor, the indices of non-zero variants, sample name, serotype, and sparse values. Value 99 was assigned to '.' records.The file output.tar (11.4 TB) contains the .vcf files used to create the TFRecords above. The data in here is contained more succinctly in the TTFrecord format. This data will not normally be used.A tar file of metadata files for the samples, metadata (95 MB). Sequence read archive (SRA) accessions were downloaded using edirect/eutilities and saved as SraAccList.txt.esearch -db sra -query "txid28901[Organism:exp] AND (cluster_public[prop] AND 'biomol dna'[Properties] AND 'library layout paired'[Properties] AND 'platform illumina'[Properties] AND 'strategy wgs'[Properties] OR 'strategy wga'[Properties] OR 'strategy wcs'[Properties] OR 'strategy clone'[Properties] OR 'strategy finishing'[Properties] OR 'strategy validation'[Properties])" | efetch -format runinfo -mode xml | xtract -pattern Row -element Run > SraAccList.txtGoogle BigQuery was used to download metadata for the SRA accessions from the National Institute of Health (NIH).SELECT * FROM nih-sra-datastore.sra.metadata as metadata INNER JOIN {table_id} as leiacc ON metadata.acc = leiacc.accID;Files were processed into batches of ~10,000 and named Sra_completed_XX.csv (00--53).A VCF document mapping the TFRecord data to the positions in the graph subjected to the Type strain LT2: mapping/DRR452337.gvg.vcf-with_TFRecord_in_1st_column.txtScripts for creating and reading TFRecord data: code.reading_and_parsing_fns.py defines functions for converting VCFs of variants called using gvg to sparse tensors and makes the TFRecord files.gvg_to_tfrecord.py creates TFRecords from from the sparse tensors.Tutorial for using the TFRecords: Example_logistic_regression.mdPangenome graph files and references used for variant calling and genotyping: pangenome.refPlus100.fasta.gz which contains the genomes of the 101 Salmonella strains without plasmids used for construction of the pangenome graph.salm.100.NC_003197_v2.d2_complete.gfa.gz The complete 101 Salmonella strain pangenome graph in Graphical Fragment Assembly (GFA2) Format 2.0 including alt nodes used for genotypingsalm.100.NC_003197_v2.full.gfa.gz the full graph including alt nodes.salm.100.NC_003197_v2.full.vcf.gz A VCF of the file abovegenotyped.gvg.vcf the genotype calls in vcf formatpaths.txt the paths of the graphSCINet users: The data folder can be accessed/retrieved with valid SCINet account at this location: /LTS/ADCdatastorage/NAL/published/node28083194/See the SCINet File Transfer guide for more information on moving large files: https://scinet.usda.gov/guides/data/datatransferGlobus users: The files can also be accessed through Globus by following this data link. The user will need to log in to Globus in order to access this data. User accounts are free of charge with several options for signing on. Instructions for creating an account are on the login page.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Business process event data modeled as labeled property graphs
Data Format
-----------
The dataset comprises one labeled property graph in two different file formats.
#1) Neo4j .dump format
A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/
/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=
The .dump was created with Neo4j v3.5.
#2) .graphml format
A .zip file containing a .graphml file of the entire graph
Data Schema
-----------
The graph is a labeled property graph over business process event data. Each graph uses the following concepts
:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"
:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")
:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node
:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations
:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities
:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.
:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log
:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph
:REL relationship - placeholder for any structural relationship between two :Entity nodes
The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552
Data Contents
-------------
neo4j-bpic15-2021-02-17 (.dump|.graphml.zip)
An integrated graph describing the raw event data of the entire BPI Challenge 2015 dataset.
van Dongen, B.F. (Boudewijn) (2015): BPI Challenge 2015. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:31a308ef-c844-48da-948c-305d167a0ec1
This data is provided by five Dutch municipalities. The data contains all building permit applications over a period of approximately four years. There are many different activities present, denoted by both codes (attribute concept:name) and labels, both in Dutch (attribute taskNameNL) and in English (attribute taskNameEN). The cases in the log contain information on the main application as well as objection procedures in various stages. Furthermore, information is available about the resource that carried out the task and on the cost of the application (attribute SUMleges). The processes in the five municipalities should be identical, but may differ slightly. Especially when changes are made to procedures, rules or regulations the time at which these changes are pushed into the five municipalities may differ. Of course, over the four year period, the underlying processes have changed. The municipalities have a number of questions, namely: What are the roles of the people involved in the various stages of the process and how do these roles differ across municipalities? What are the possible points for improvement on the organizational structure for each of the municipalities? The employees of two of the five municipalities have physically moved into the same location recently. Did this lead to a change in the processes and if so, what is different? Some of the procedures will be outsourced from 2018, i.e. they will be removed from the process and the applicant needs to have these activities performed by an external party before submitting the application. What will be the effect of this on the organizational structures in the five municipalities? Where are differences in throughput times between the municipalities and how can these be explained? What are the differences in control flow between the municipalities? There are five different log files available in this collection. Events are labeled with both a code and a Dutch and English label. Each activity code consists of three parts: two digits, a variable number of characters, and then three digits. The first two digits as well as the characters indicate the subprocess the activity belongs to. For instance ‘01_HOOFD_xxx’ indicates the main process and ‘01_BB_xxx’ indicates the ‘objections and complaints’ (‘Beroep en Bezwaar’ in Dutch) subprocess. The last three digits hint on the order in which activities are executed, where the first digit often indicates a phase within a process. Each trace and each event, contain several data attributes that can be used for various checks and predictions. Furthermore, some employees may have performed tasks for different municipalities, i.e. if the employee number is the same, it is safe to assume the same person is being identified.
The data contains the following entities and their events
- Application - a building permit application handled in one of five Dutch municipalities
- Case_R - a user or worker involved in handling the application
- Responsible_actor - a user or worker designated as responsible actor for an activity
- monitoringResource - a user or worker designated as monitoring resource for an activity
The data contains 5 event log nodes as the data was integrated from 5 different event logs from 5 different systems.
Data Size
---------
BPIC15, nodes: 268851, relationships: 2620418
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Romania Exports of maps, hydrographic or similar charts (printed) to Poland was US$10 during 2024, according to the United Nations COMTRADE database on international trade. Romania Exports of maps, hydrographic or similar charts (printed) to Poland - data, historical chart and statistics - was last updated on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Jordan: PISA science scores: The latest value from 2022 is 374.527 index points, a decline from 429.252 index points in 2018. In comparison, the world average is 449.005 index points, based on data from 78 countries. Historically, the average for Jordan from 2006 to 2022 is 409.863 index points. The minimum value, 374.527 index points, was reached in 2022 while the maximum of 429.252 index points was recorded in 2018.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The graph representation of the Anatomical Structures, Cell Types, plus Biomarkers (ASCT+B) table for Liver dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Finland: Human Development Index (0 - 1): The latest value from 2023 is 0.948 points, an increase from 0.942 points in 2022. In comparison, the world average is 0.744 points, based on data from 185 countries. Historically, the average for Finland from 1980 to 2023 is 0.876 points. The minimum value, 0.752 points, was reached in 1980 while the maximum of 0.948 points was recorded in 2023.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vietnam Imports from Japan of Maps, Hydrographic or Similar Charts (Printed) was US$957 during 2021, according to the United Nations COMTRADE database on international trade. Vietnam Imports from Japan of Maps, Hydrographic or Similar Charts (Printed) - data, historical chart and statistics - was last updated on August of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Malawi Imports of maps, hydrographic or similar charts (printed) from Ethiopia was US$247 during 2020, according to the United Nations COMTRADE database on international trade. Malawi Imports of maps, hydrographic or similar charts (printed) from Ethiopia - data, historical chart and statistics - was last updated on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Uruguay Imports from Netherlands of Maps, Hydrographic or Similar Charts (Printed) was US$110 during 2020, according to the United Nations COMTRADE database on international trade. Uruguay Imports from Netherlands of Maps, Hydrographic or Similar Charts (Printed) - data, historical chart and statistics - was last updated on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Malawi Imports from Burundi of Maps, Hydrographic or Similar Charts (Printed) was US$100 during 2009, according to the United Nations COMTRADE database on international trade. Malawi Imports from Burundi of Maps, Hydrographic or Similar Charts (Printed) - data, historical chart and statistics - was last updated on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
South Korea Imports from India of Maps, Hydrographic or Similar Charts (Printed) was US$2.58 Thousand during 2024, according to the United Nations COMTRADE database on international trade. South Korea Imports from India of Maps, Hydrographic or Similar Charts (Printed) - data, historical chart and statistics - was last updated on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Uruguay Imports from Italy of Organo-sulfur Compounds was US$4.08 Thousand during 2023, according to the United Nations COMTRADE database on international trade. Uruguay Imports from Italy of Organo-sulfur Compounds - data, historical chart and statistics - was last updated on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Luxembourg Imports from Netherlands of Maps, Hydrographic or Similar Charts (Printed) was US$235 during 2024, according to the United Nations COMTRADE database on international trade. Luxembourg Imports from Netherlands of Maps, Hydrographic or Similar Charts (Printed) - data, historical chart and statistics - was last updated on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Austria Imports from Slovenia of Nitrile-function Compounds was US$554 during 2024, according to the United Nations COMTRADE database on international trade. Austria Imports from Slovenia of Nitrile-function Compounds - data, historical chart and statistics - was last updated on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spain Imports from Honduras of Maps, Hydrographic or Similar Charts (Printed) was US$333 during 2019, according to the United Nations COMTRADE database on international trade. Spain Imports from Honduras of Maps, Hydrographic or Similar Charts (Printed) - data, historical chart and statistics - was last updated on August of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Patient-drug-disease (PDD) Graph dataset, utilising Electronic medical records (EMRS) and biomedical Knowledge graphs. The novel framework to construct the PDD graph is described in the associated publication.PDD is an RDF graph consisting of PDD facts, where a PDD fact is represented by an RDF triple to indicate that a patient takes a drug or a patient is diagnosed with a disease. For instance, (pdd:274671, pdd:diagnosed, sepsis)Data files are in .nt N-Triple format, a line-based syntax for an RDF graph. These can be accessed via openly-available text edit software.diagnose_icd_information.nt - contains RDF triples mapping patients to diagnoses. For example:(pdd:18740, pdd:diagnosed, icd99592),where pdd:18740 is a patient entity, and icd99592 is the ICD-9 code of sepsis.drug_patients.nt- contains RDF triples mapping patients to drugs. For example:(pdd:18740, pdd:prescribed, aspirin),where pdd:18740 is a patient entity, and aspirin is the drug's name.Background:Electronic medical records contain multi-format electronic medical data that consist of an abundance of medical knowledge. Faced with patients' symptoms, experienced caregivers make the right medical decisions based on their professional knowledge, which accurately grasps relationships between symptoms, diagnoses and corresponding treatments. In the associated paper, we aim to capture these relationships by constructing a large and high-quality heterogenous graph linking patients, diseases, and drugs (PDD) in EMRs. Specifically, we propose a novel framework to extract important medical entities from MIMIC-III (Medical Information Mart for Intensive Care III) and automatically link them with the existing biomedical knowledge graphs, including ICD-9 ontology and DrugBank. The PDD graph presented in this paper is accessible on the Web via the SPARQL endpoint as well as in .nt format in this repository, and provides a pathway for medical discovery and applications, such as effective treatment recommendations.De-identificationIt is necessary to mention that MIMIC-III contains clinical information of patients. Although the protected health information was de-identifed, researchers who seek to use more clinical data should complete an on-line training course and then apply for the permission to download the complete MIMIC-III dataset: https://mimic.physionet.org/