10 datasets found

SeMRA Raw Semantic Mapping Database
zenodo.org
application/gzip, bin +1
Updated May 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charles Tapley Hoyt; Charles Tapley Hoyt; Benjamin Gyori; Benjamin Gyori (2025). SeMRA Raw Semantic Mapping Database [Dataset]. http://doi.org/10.5281/zenodo.15504009
Explore at:
application/gzip, bin, shAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15504009
Dataset updated
May 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Charles Tapley Hoyt; Charles Tapley Hoyt; Benjamin Gyori; Benjamin Gyori
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
An automatically assembled dataset of raw semantic mappings produced by python -m semra.database. This incorporates mappings from the following places:

Ontologies indexed in the Bioregistry (primary)

Databases integrated in PyOBO (primary)

Biomappings (secondary)

Wikidata (primary/secondary)

Custom resources integrated in SeMRA (primary)

This is a database of raw mapping without further processing. For processed mapping datasets, we suggest smaller domain-specific processing rules (see https://github.com/biopragmatics/semra/tree/main/notebooks/landscape for examples). It can be accessed directly via:

mappings.sssom.tsv.gz - loadable through any tools supporting SSSOM

mappings.jsonl.gz - loadable through SeMRA using semra.from_jsonl

How to Run the Web App

Download all artifacts from this Record

Make sure that you have Docker running locally

Run sh run_on_docker.sh from the command line

Navigate to http://localhost:8773 to see the SeMRA dashboard or to http://localhost:7474 for direct access to the Neo4j graph database

Licensing

Mappings are licensed according to their primary resources. These are explicitly annotated in the SSSOM file on each row (when available) and on the mapping set level in the Neo4j graph database artifacts.
Z
Dataset used for "A Recommender System of Buggy App Checkers for App Store...
data.niaid.nih.gov
Updated Jun 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Gomez (2021). Dataset used for "A Recommender System of Buggy App Checkers for App Store Moderators" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5034291
Explore at:
Dataset updated
Jun 28, 2021
Dataset provided by
Maria Gomez
Romain Rouvoy
Lionel Seinturier
Martin Monperrus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset used for paper: "A Recommender System of Buggy App Checkers for App Store Moderators", published on the International Conference on Mobile Software Engineering and Systems (MOBILESoft) in 2015.

Dataset Collection We built a dataset that consists of a random sample of Android app metadata and user reviews available on the Google Play Store on January and March 2014. Since the Google Play Store is continuously evolving (adding, removing and/or updating apps), we updated the dataset twice. The dataset D1 contains available apps in the Google Play Store in January 2014. Then, we created a new snapshot (D2) of the Google Play Store in March 2014.

The apps belong to the 27 different categories defined by Google (at the time of writing the paper), and the 4 predefined subcategories (free, paid, new_free, and new_paid). For each category-subcategory pair (e.g. tools-free, tools-paid, sports-new_free, etc.), we collected a maximum of 500 samples, resulting in a median number of 1.978 apps per category.

For each app, we retrieved the following metadata: name, package, creator, version code, version name, number of downloads, size, upload date, star rating, star counting, and the set of permission requests.

In addition, for each app, we collected up to a maximum of the latest 500 reviews posted by users in the Google Play Store. For each review, we retrieved its metadata: title, description, device, and version of the app. None of these fields were mandatory, thus several reviews lack some of these details. From all the reviews attached to an app, we only considered the reviews associated with the latest version of the app —i.e., we discarded unversioned and old-versioned reviews. Thus, resulting in a corpus of 1,402,717 reviews (2014 Jan.).

Dataset Stats Some stats about the datasets:

D1 (Jan. 2014) contains 38,781 apps requesting 7,826 different permissions, and 1,402,717 user reviews.

D2 (Mar. 2014) contains 46,644 apps and 9,319 different permission requests, and 1,361,319 user reviews.

Additional stats about the datasets are available here.

Dataset Description To store the dataset, we created a graph database with Neo4j. This dataset therefore consists of a graph describing the apps as nodes and edges. We chose a graph database because the graph visualization helps to identify connections among data (e.g., clusters of apps sharing similar sets of permission requests).

In particular, our dataset graph contains six types of nodes: - APP nodes containing metadata of each app, - PERMISSION nodes describing permission types, - CATEGORY nodes describing app categories, - SUBCATEGORY nodes describing app subcategories, - USER_REVIEW nodes storing user reviews. - TOPIC topics mined from user reviews (using LDA).

Furthermore, there are five types of relationships between APP nodes and each of the remaining nodes:

USES_PERMISSION relationships between APP and PERMISSION nodes

HAS_REVIEW between APP and USER_REVIEW nodes

HAS_TOPIC between USER_REVIEW and TOPIC nodes

BELONGS_TO_CATEGORY between APP and CATEGORY nodes

BELONGS_TO_SUBCATEGORY between APP and SUBCATEGORY nodes

Dataset Files Info

Neo4j 2.0 Databases

googlePlayDB1-Jan2014_neo4j_2_0.rar

googlePlayDB2-Mar2014_neo4j_2_0.rar We provide two Neo4j databases containing the 2 snapshots of the Google Play Store (January and March 2014). These are the original databases created for the paper. The databases were created with Neo4j 2.0. In particular with the tool version 'Neo4j 2.0.0-M06 Community Edition' (latest version available at the time of implementing the paper in 2014).

Neo4j 3.5 Databases

googlePlayDB1-Jan2014_neo4j_3_5_28.rar

googlePlayDB2-Mar2014_neo4j_3_5_28.rar Currently, the version Neo4j 2.0 is deprecated and it is not available for download in the official Neo4j Download Center. We have migrated the original databases (Neo4j 2.0) to Neo4j 3.5.28. The databases can be opened with the tool version: 'Neo4j Community Edition 3.5.28'. The tool can be downloaded from the official Neo4j Donwload page.

In order to open the databases with more recent versions of Neo4j, the databases must be first migrated to the corresponding version. Instructions about the migration process can be found in the Neo4j Migration Guide. First time the Neo4j database is connected, it could request credentials. The username and pasword are: neo4j/neo4j
h
text2cypher-gpt4o-clean
huggingface.co
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomaž Bratanič (2024). text2cypher-gpt4o-clean [Dataset]. https://huggingface.co/datasets/tomasonjo/text2cypher-gpt4o-clean
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 23, 2024
Authors
Tomaž Bratanič
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Synthetic dataset created with GPT-4o

Synthetic dataset of text2cypher over 16 different graph schemas. Questions were generated using GPT-4-turbo, and the corresponding Cypher statements with gpt-4o using Chain of Thought. Here, there are only questions that return results when queried against the database. For more information visit: https://github.com/neo4j-labs/text2cypher/tree/main/datasets/synthetic_gpt4o_demodbs Dataset is available as train.csv. Columns are the following:… See the full description on the dataset page: https://huggingface.co/datasets/tomasonjo/text2cypher-gpt4o-clean.
Small E-Commerce Site
zenodo.org
csv
Updated Jan 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francesco Cambria; Francesco Cambria (2025). Small E-Commerce Site [Dataset]. http://doi.org/10.5281/zenodo.14728706
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14728706
Dataset updated
Jan 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Francesco Cambria; Francesco Cambria
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was constructed as a small example of a graph mimicking an e-commerce site where people can also follow each others.

The files here reported can be used to build a property graph in Neo4J:

item.csv - contains the data for the Item nodes.

person.csv - contains the data for the Person nodes.

category.csv - contains the data for the Category nodes.

follow.csv - contains the data for the FOLLOW relationships from Person to Person nodes.

buy.csv - contains the data for the BUY relationships from Person to Item nodes.

reccomend.csv - contains the data for the RECOMMEND relationship from Person to Item nodes.

of.csv - contains the data for the OF relationship from Item to Category nodes.

This data was used as motivating example dataset in the paper "MINE GRAPH RULE: A New GQL Operator for Mining Association Rules in Property Graph Databases".
f
DataSheet1_Threat modelling in Internet of Things (IoT) environments using...
figshare.com
zip
Updated May 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marwa Salayma (2024). DataSheet1_Threat modelling in Internet of Things (IoT) environments using dynamic attack graphs.ZIP [Dataset]. http://doi.org/10.3389/friot.2024.1306465.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/friot.2024.1306465.s001
Dataset updated
May 30, 2024
Dataset provided by
Frontiers
Authors
Marwa Salayma
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This work presents a threat modelling approach to represent changes to the attack paths through an Internet of Things (IoT) environment when the environment changes dynamically, that is, when new devices are added or removed from the system or when whole sub-systems join or leave. The proposed approach investigates the propagation of threats using attack graphs, a popular attack modelling method. However, traditional attack-graph approaches have been applied in static environments that do not continuously change, such as enterprise networks, leading to static and usually very large attack graphs. In contrast, IoT environments are often characterised by dynamic change and interconnections; different topologies for different systems may interconnect with each other dynamically and outside the operator’s control. Such new interconnections lead to changes in the reachability amongst devices according to which their corresponding attack graphs change. This requires dynamic topology and attack graphs for threat and risk analysis. This article introduces an example scenario based on healthcare systems to motivate the work and illustrate the proposed approach. The proposed approach is implemented using a graph database management tool (GDBM), Neo4j, which is a popular tool for mapping, visualising, and querying the graphs of highly connected data. It is efficient in providing a rapid threat modelling mechanism, making it suitable for capturing security changes in the dynamic IoT environment. Our results show that our developed threat modelling approach copes with dynamic system changes that may occur in IoT environments and enables identifying attack paths, whilst allowing for system dynamics. The developed dynamic topology and attack graphs can cope with the changes in the IoT environment efficiently and rapidly by maintaining their associated graphs.
C
Event Graph of BPI Challenge 2019
data.4tu.nl
zip
Updated Apr 22, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dirk Fahland (2021). Event Graph of BPI Challenge 2019 [Dataset]. http://doi.org/10.4121/14169614.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/14169614.v1
Dataset updated
Apr 22, 2021
Dataset provided by
4TU.ResearchData
Authors
Dirk Fahland
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Business process event data modeled as labeled property graphs

Data Format
-----------

The dataset comprises one labeled property graph in two different file formats.

#1) Neo4j .dump format

A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/

/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=

The .dump was created with Neo4j v3.5.

#2) .graphml format

A .zip file containing a .graphml file of the entire graph

Data Schema
-----------

The graph is a labeled property graph over business process event data. Each graph uses the following concepts

:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"

:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")

:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node

:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations

:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities

:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.

:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log

:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph

:REL relationship - placeholder for any structural relationship between two :Entity nodes

The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552

Data Contents
-------------

neo4j-bpic19-2021-02-17 (.dump|.graphml.zip)

An integrated graph describing the raw event data of the entire BPI Challenge 2019 dataset.
van Dongen, B.F. (Boudewijn) (2019): BPI Challenge 2019. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:d06aff4b-79f0-45e6-8ec8-e19730c248f1

This data originated from a large multinational company operating from The Netherlands in the area of coatings and paints and we ask participants to investigate the purchase order handling process for some of its 60 subsidiaries. In particular, the process owner has compliance questions. In the data, each purchase order (or purchase document) contains one or more line items. For each line item, there are roughly four types of flows in the data: (1) 3-way matching, invoice after goods receipt: For these items, the value of the goods receipt message should be matched against the value of an invoice receipt message and the value put during creation of the item (indicated by both the GR-based flag and the Goods Receipt flags set to true). (2) 3-way matching, invoice before goods receipt: Purchase Items that do require a goods receipt message, while they do not require GR-based invoicing (indicated by the GR-based IV flag set to false and the Goods Receipt flags set to true). For such purchase items, invoices can be entered before the goods are receipt, but they are blocked until goods are received. This unblocking can be done by a user, or by a batch process at regular intervals. Invoices should only be cleared if goods are received and the value matches with the invoice and the value at creation of the item. (3) 2-way matching (no goods receipt needed): For these items, the value of the invoice should match the value at creation (in full or partially until PO value is consumed), but there is no separate goods receipt message required (indicated by both the GR-based flag and the Goods Receipt flags set to false). (4)Consignment: For these items, there are no invoices on PO level as this is handled fully in a separate process. Here we see GR indicator is set to true but the GR IV flag is set to false and also we know by item type (consignment) that we do not expect an invoice against this item. Unfortunately, the complexity of the data goes further than just this division in four categories. For each purchase item, there can be many goods receipt messages and corresponding invoices which are subsequently paid. Consider for example the process of paying rent. There is a Purchase Document with one item for paying rent, but a total of 12 goods receipt messages with (cleared) invoices with a value equal to 1/12 of the total amount. For logistical services, there may even be hundreds of goods receipt messages for one line item. Overall, for each line item, the amounts of the line item, the goods receipt messages (if applicable) and the invoices have to match for the process to be compliant. Of course, the log is anonymized, but some semantics are left in the data, for example: The resources are split between batch users and normal users indicated by their name. The batch users are automated processes executed by different systems. The normal users refer to human actors in the process. The monetary values of each event are anonymized from the original data using a linear translation respecting 0, i.e. addition of multiple invoices for a single item should still lead to the original item worth (although there may be small rounding errors for numerical reasons). Company, vendor, system and document names and IDs are anonymized in a consistent way throughout the log. The company has the key, so any result can be translated by them to business insights about real customers and real purchase documents.

The case ID is a combination of the purchase document and the purchase item. There is a total of 76,349 purchase documents containing in total 251,734 items, i.e. there are 251,734 cases. In these cases, there are 1,595,923 events relating to 42 activities performed by 627 users (607 human users and 20 batch users). Sometimes the user field is empty, or NONE, which indicates no user was recorded in the source system. For each purchase item (or case) the following attributes are recorded: concept:name: A combination of the purchase document id and the item id, Purchasing Document: The purchasing document ID, Item: The item ID, Item Type: The type of the item, GR-Based Inv. Verif.: Flag indicating if GR-based invoicing is required (see above), Goods Receipt: Flag indicating if 3-way matching is required (see above), Source: The source system of this item, Doc. Category name: The name of the category of the purchasing document, Company: The subsidiary of the company from where the purchase originated, Spend classification text: A text explaining the class of purchase item, Spend area text: A text explaining the area for the purchase item, Sub spend area text: Another text explaining the area for the purchase item, Vendor: The vendor to which the purchase document was sent, Name: The name of the vendor, Document Type: The document type, Item Category: The category as explained above (3-way with GR-based invoicing, 3-way without, 2-way, consignment).

The data contains the following entities and their events

- PO - Purchase Order documents handled at a large multinational company operating from The Netherlands
- POItem - an item in a Purchase Order document describing a specific item to be purchased
- Resource - the user or worker handling the document or a specific item
- Vendor - the external organization from which an item is to be purchased

Data Size
---------

BPIC19, nodes: 1926651, relationships: 15082099
Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and...
zenodo.org
data.niaid.nih.gov
bin, csv, png, txt +1
Updated Aug 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mefta Sadat; Ayse Basar Bener; Andriy V. Miranskyy; Mefta Sadat; Ayse Basar Bener; Andriy V. Miranskyy (2024). Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and KDE [Dataset]. http://doi.org/10.5281/zenodo.400614
Explore at:
csv, bin, zip, txt, pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.400614
Dataset updated
Aug 3, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mefta Sadat; Ayse Basar Bener; Andriy V. Miranskyy; Mefta Sadat; Ayse Basar Bener; Andriy V. Miranskyy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present three defect rediscovery datasets mined from Bugzilla. The datasets capture data for three groups of open source software projects: Apache, Eclipse, and KDE. The datasets contain information about approximately 914 thousands of defect reports over a period of 18 years (1999-2017) to capture the inter-relationships among duplicate defects.

File Descriptions

apache.csv - Apache Defect Rediscovery dataset

eclipse.csv - Eclipse Defect Rediscovery dataset

kde.csv - KDE Defect Rediscovery dataset

apache.relations.csv - Inter-relations of rediscovered defects of Apache

eclipse.relations.csv - Inter-relations of rediscovered defects of Eclipse

kde.relations.csv - Inter-relations of rediscovered defects of KDE

create_and_populate_neo4j_objects.cypher - Populates Neo4j graphDB by importing all the data from the CSV files. Note that you have to set dbms.import.csv.legacy_quote_escaping configuration setting to false to load the CSV files as per https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/#config_dbms.import.csv.legacy_quote_escaping

create_and_populate_mysql_objects.sql - Populates MySQL RDBMS by importing all the data from the CSV files

rediscovery_db_mysql.zip - For your convenience, we also provide full backup of the MySQL database

neo4j_examples.txt - Sample Neo4j queries

mysql_examples.txt - Sample MySQL queries

rediscovery_eclipse_6325.png - Output of Neo4j example #1

distinct_attrs.csv - Distinct values of bug_status, resolution, priority, severity for each project
DT4GS Knowledge Graph
zenodo.org
bin
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marco Torre; Marco Torre; Valerio Paolini; Valerio Paolini; eros manzo; eros manzo (2025). DT4GS Knowledge Graph [Dataset]. http://doi.org/10.5281/zenodo.15716836
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15716836
Dataset updated
Jun 23, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marco Torre; Marco Torre; Valerio Paolini; Valerio Paolini; eros manzo; eros manzo
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The Knowledge Graph (KG) is designed with a focus on IMO number, ship name, and ship type as central identifiers, ensuring that every node representing a vessel is anchored by these unique values. This design choice allows for a clear and consistent visual representation where each ship is easily identifiable, and all associated data is linked directly to these identifiers.

In the Neo4j graph, the visualization is structured so that nodes and relationships form an interconnected web of information. The graph includes nodes for CO₂ emissions, fuel usage, KPIs, time information, and basic registry details. The right-hand panel in the Neo4j interface is particularly useful; it displays detailed properties of any selected node. For example, when a user selects a node representing CO₂ emissions, the panel highlights specific data points such as annual CO₂ emissions per distance traveled and per unit of work performed. This detailed view is essential for understanding the environmental performance of a vessel. It provides stakeholders with an at-a-glance summary of key performance metrics and allows them to drill down into the data for further analysis.

The visual structure of the Knowledge Graph also reinforces the interconnected nature of the data. The graph not only displays nodes but also shows the relationships between them, which represent how different aspects of a vessel’s performance are linked. For instance, a ship node (BasicInfo) is visually connected to multiple other nodes, such as Fuel, CO₂, TimeInfo, KPI, and CargoTransport. Since IMO number, ship name, and ship type serve as the primary unique identifiers, they act as the backbone that links all related data tables and relationships within the graph. This interconnected layout makes it immediately apparent which vessels have similar performance profiles, and which operational metrics are closely correlated.

By leveraging Neo4j’s powerful visualization capabilities, the KG transforms static data into a dynamic and interactive tool for real-time decision-making.
h
translated_text2cypher24_trainset_sampled
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MGO, translated_text2cypher24_trainset_sampled [Dataset]. https://huggingface.co/datasets/mgoNeo4j/translated_text2cypher24_trainset_sampled
Explore at:
Authors
MGO
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Translated Text2Cypher'24 Training Set - Sampled & Multilingual

This dataset provides a sampled and translated training set based on the Neo4j Text2Cypher '24 dataset. It is designed to support research on multilingual natural language to Cypher query generation. We offer two versions of the training set:

1. Multilingual Version (multilang)

Total examples: ~36,000
Languages: English (en), Spanish (es), Turkish (tr)
Samples per language: ~12,000
Translation… See the full description on the dataset page: https://huggingface.co/datasets/mgoNeo4j/translated_text2cypher24_trainset_sampled.
Scaling Ecommerce Graphs
zenodo.org
zip
Updated Jan 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francesco Cambria; Francesco Cambria (2025). Scaling Ecommerce Graphs [Dataset]. http://doi.org/10.5281/zenodo.14728774
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14728774
Dataset updated
Jan 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Francesco Cambria; Francesco Cambria
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This folder contains all the datasets used for the performance evaluation of the MINE GRAPH RULE operator proposed in the paper "MINE GRAPH RULE: A New GQL Operator for Mining Association Rules in Property Graph Databases".

Each folder contains the following files used to create a property graph in Neo4j with a fixed schema mimicking an e-commerce site.

Item.csv - contains the data for the Item nodes.

Person.csv - contains the data for the Person nodes.

Category.csv - contains the data for the Category nodes.

FOLLOW.csv - contains the data for the FOLLOW relationships from Person to Person nodes.

BUY.csv - contains the data for the BUY relationships from Person to Item nodes.

RECOMMEND.csv - contains the data for the RECOMMEND relationship from Person to Item nodes.

OF.csv - contains the data for the OF relationship from Item to Category nodes.

The folders contain various graph instances with differing dimensions, and each folder is named to reflect its defining features. The features in the name are given in this order:

Total number of nodes within the graph.

Ratio of the number of Person nodes over the nodes with other labels.

Probability of having a relationship FOLLOW between two Person nodes.

Probability of having a relationship BUY between a Person node and an Item node.

Probability of having a relationship RECOMMEND between a Person node and an Item node.

(Example: the folder 10000_0.5_0.0005_0.1_0.0005_dataset contains files of a graph with 10000 nodes, of which half of them are Person nodes, 0.0005 is the probability of having a relationship FOLLOW between two Person nodes, 0.1 is the probability of having a relationship BUY between a Person node and an Item node, and 0.0005 is the probability of having a relationship RECOMMEND between a Person node and an Item node).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Charles Tapley Hoyt; Charles Tapley Hoyt; Benjamin Gyori; Benjamin Gyori (2025). SeMRA Raw Semantic Mapping Database [Dataset]. http://doi.org/10.5281/zenodo.15504009

SeMRA Raw Semantic Mapping Database

Explore at:

application/gzip, bin, shAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15504009

Dataset updated

May 24, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Charles Tapley Hoyt; Charles Tapley Hoyt; Benjamin Gyori; Benjamin Gyori

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

An automatically assembled dataset of raw semantic mappings produced by python -m semra.database. This incorporates mappings from the following places:

Ontologies indexed in the Bioregistry (primary)
Databases integrated in PyOBO (primary)
Biomappings (secondary)
Wikidata (primary/secondary)
Custom resources integrated in SeMRA (primary)

This is a database of raw mapping without further processing. For processed mapping datasets, we suggest smaller domain-specific processing rules (see https://github.com/biopragmatics/semra/tree/main/notebooks/landscape for examples). It can be accessed directly via:

mappings.sssom.tsv.gz - loadable through any tools supporting SSSOM
mappings.jsonl.gz - loadable through SeMRA using semra.from_jsonl

How to Run the Web App

Download all artifacts from this Record
Make sure that you have Docker running locally
Run sh run_on_docker.sh from the command line
Navigate to http://localhost:8773 to see the SeMRA dashboard or to http://localhost:7474 for direct access to the Neo4j graph database

Licensing

Mappings are licensed according to their primary resources. These are explicitly annotated in the SSSOM file on each row (when available) and on the mapping set level in the Neo4j graph database artifacts.

Clear search

Close search

Google apps

Main menu

SeMRA Raw Semantic Mapping Database

How to Run the Web App

Licensing

Dataset used for "A Recommender System of Buggy App Checkers for App Store...

text2cypher-gpt4o-clean

Small E-Commerce Site

DataSheet1_Threat modelling in Internet of Things (IoT) environments using...

Event Graph of BPI Challenge 2019

Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and...

DT4GS Knowledge Graph

translated_text2cypher24_trainset_sampled

Scaling Ecommerce Graphs

SeMRA Raw Semantic Mapping Database

How to Run the Web App

Licensing