Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset is a Neo4j knowledge graph based on TMF Business Process Framework v22.0 data.
CSV files contain data about the model entities, and the JSON file contains knowledge graph mapping.
The script used to generate CSV files based on the XML model can be found here.
To import the dataset, download the zip archive and upload it to Neo4j.
You also can check this dataset here.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present three defect rediscovery datasets mined from Bugzilla. The datasets capture data for three groups of open source software projects: Apache, Eclipse, and KDE. The datasets contain information about approximately 914 thousands of defect reports over a period of 18 years (1999-2017) to capture the inter-relationships among duplicate defects.
File Descriptions
apache.csv - Apache Defect Rediscovery dataset
eclipse.csv - Eclipse Defect Rediscovery dataset
kde.csv - KDE Defect Rediscovery dataset
apache.relations.csv - Inter-relations of rediscovered defects of Apache
eclipse.relations.csv - Inter-relations of rediscovered defects of Eclipse
kde.relations.csv - Inter-relations of rediscovered defects of KDE
create_and_populate_neo4j_objects.cypher - Populates Neo4j graphDB by importing all the data from the CSV files. Note that you have to set dbms.import.csv.legacy_quote_escaping configuration setting to false to load the CSV files as per https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/#config_dbms.import.csv.legacy_quote_escaping
create_and_populate_mysql_objects.sql - Populates MySQL RDBMS by importing all the data from the CSV files
rediscovery_db_mysql.zip - For your convenience, we also provide full backup of the MySQL database
neo4j_examples.txt - Sample Neo4j queries
mysql_examples.txt - Sample MySQL queries
rediscovery_eclipse_6325.png - Output of Neo4j example #1
distinct_attrs.csv - Distinct values of bug_status, resolution, priority, severity for each project
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TBRNAT was developed to amalgamate information from peer-reviewed publications related to the regulation of M. tuberculosis transcription factors on gene expression and provide users with an intuitive way to visualize this data. The data provided is in text form and is meant to be used by a graph database such as Neo4j. There are two files: a TbRNAT.csv file and a TBREGNET.csv file. The former contains labels for each transcription factor in the database. The column titles are self-explanatory with the exception of 'id' which is an internal id created for use with Neo4j. The latter file contains 'sourceId' and 'targetId' which are also Neo4j identifiers. 'totalAssociations' is the number of direct associations ("edges") for a particular transcription factor. 'pmid' = PubMed identifier referencing where the data was obtained
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
As important carriers of innovation activities, patents, sci-tech achievements and papers play an increasingly prominent role in national political and economic development under the background of a new round of technological revolution and industrial transformation. However, in a distributed and heterogeneous environment, the integration and systematic description of patents, sci-tech achievements and papers data are still insufficient, which limits the in-depth analysis and utilization of related data resources. The dataset of knowledge graph construction for patents, sci-tech achievements and papers is an important means to promote innovation network research, and is of great significance for strengthening the development, utilization, and knowledge mining of innovation data. This work collected data on patents, sci-tech achievements and papers from China's authoritative websites spanning the three major industries—agriculture, industry, and services—during the period 2022-2025. After processes of cleaning, organizing, and normalization, a patents-sci-tech achievements-papers knowledge graph dataset was formed, containing 10 entity types and 8 types of entity relationships. To ensure quality and accuracy of data, the entire process involved strict preprocessing, semantic extraction and verification, with the ontology model introduced as the schema layer of the knowledge graph. The dataset establishes direct correlations among patents, sci-tech achievements and papers through inventors/contributors/authors, and utilizes the Neo4j graph database for storage and visualization. The open dataset constructed in this study can serve as important foundational data for building knowledge graphs in the field of innovation, providing structured data support for innovation activity analysis, scientific research collaboration network analysis and knowledge discovery.The dataset consists of two parts. The first part includes three Excel tables: 1,794 patent records with 10 fields, 181 paper records with 7 fields, and 1,156 scientific and technological achievement records with 11 fields. The second part is a knowledge graph dataset in CSV format that can be imported into Neo4j, comprising 10 entity files and 8 relationship files.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the replication package for the paper "NeoModeling Framework: Leveraging Graph-Based Persistence for Large-Scale Model-Driven Engineering" where we present Neo Modeling Framework (NMF), an open-source set of tools primarily designed to manipulate ultra-large datasets in the Neo4j database.
The best way to run NMF is following the instructions at our GitHub repository. A copy of the Readme file is also present inside the zip file available here.
Make sure that you follow the instructions to run NMF.
The quantitative evaluation can be re-run by running RQ1Eval.kt, RQ2Eval.kt inside modelLoader/src/test/kotlin/evaluation and RQ2Eval.kt inside modelEditor/src/test/kotlin/evaluation.
Make sure that you have an empty instance of Neo4j running.
Results will be generated as CSV files, under Evaluation/results and the results can be plotted by running the Jupyter Notebooks at Evaluation/analysis.
Please note that due to differences in hardware, re-running the experiments will probably generate slightly different results than those reported in the paper.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset is a Neo4j knowledge graph based on TMF Business Process Framework v22.0 data.
CSV files contain data about the model entities, and the JSON file contains knowledge graph mapping.
The script used to generate CSV files based on the XML model can be found here.
To import the dataset, download the zip archive and upload it to Neo4j.
You also can check this dataset here.