5 datasets found
  1. TMF Business Process Framework Dataset for Neo4j

    • kaggle.com
    zip
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksei Golovin (2023). TMF Business Process Framework Dataset for Neo4j [Dataset]. https://www.kaggle.com/datasets/algord/tmf-business-process-framework-dataset-for-neo4j
    Explore at:
    zip(13261206 bytes)Available download formats
    Dataset updated
    Dec 4, 2023
    Authors
    Aleksei Golovin
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    TMF Business Process Framework Dataset for Neo4j

    The dataset is a Neo4j knowledge graph based on TMF Business Process Framework v22.0 data.
    CSV files contain data about the model entities, and the JSON file contains knowledge graph mapping.
    The script used to generate CSV files based on the XML model can be found here.

    To import the dataset, download the zip archive and upload it to Neo4j.

    You also can check this dataset here.

  2. Z

    Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and...

    • data.niaid.nih.gov
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sadat, Mefta; Bener, Ayse Basar; Miranskyy, Andriy V. (2024). Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and KDE [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_400614
    Explore at:
    Dataset updated
    Aug 3, 2024
    Dataset provided by
    Ryerson University
    Authors
    Sadat, Mefta; Bener, Ayse Basar; Miranskyy, Andriy V.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present three defect rediscovery datasets mined from Bugzilla. The datasets capture data for three groups of open source software projects: Apache, Eclipse, and KDE. The datasets contain information about approximately 914 thousands of defect reports over a period of 18 years (1999-2017) to capture the inter-relationships among duplicate defects.

    File Descriptions

    apache.csv - Apache Defect Rediscovery dataset

    eclipse.csv - Eclipse Defect Rediscovery dataset

    kde.csv - KDE Defect Rediscovery dataset

    apache.relations.csv - Inter-relations of rediscovered defects of Apache

    eclipse.relations.csv - Inter-relations of rediscovered defects of Eclipse

    kde.relations.csv - Inter-relations of rediscovered defects of KDE

    create_and_populate_neo4j_objects.cypher - Populates Neo4j graphDB by importing all the data from the CSV files. Note that you have to set dbms.import.csv.legacy_quote_escaping configuration setting to false to load the CSV files as per https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/#config_dbms.import.csv.legacy_quote_escaping

    create_and_populate_mysql_objects.sql - Populates MySQL RDBMS by importing all the data from the CSV files

    rediscovery_db_mysql.zip - For your convenience, we also provide full backup of the MySQL database

    neo4j_examples.txt - Sample Neo4j queries

    mysql_examples.txt - Sample MySQL queries

    rediscovery_eclipse_6325.png - Output of Neo4j example #1

    distinct_attrs.csv - Distinct values of bug_status, resolution, priority, severity for each project

  3. TBRNAT: Tuberculosis Regulatory Network Analysis Tool

    • figshare.com
    pdf
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael A. Dolan; Vijayaraj Nagarajan; Ramandeep Kaur; Yamil Boo-Irizarry (2023). TBRNAT: Tuberculosis Regulatory Network Analysis Tool [Dataset]. http://doi.org/10.6084/m9.figshare.23748921.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Michael A. Dolan; Vijayaraj Nagarajan; Ramandeep Kaur; Yamil Boo-Irizarry
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TBRNAT was developed to amalgamate information from peer-reviewed publications related to the regulation of M. tuberculosis transcription factors on gene expression and provide users with an intuitive way to visualize this data. The data provided is in text form and is meant to be used by a graph database such as Neo4j. There are two files: a TbRNAT.csv file and a TBREGNET.csv file. The former contains labels for each transcription factor in the database. The column titles are self-explanatory with the exception of 'id' which is an internal id created for use with Neo4j. The latter file contains 'sourceId' and 'targetId' which are also Neo4j identifiers. 'totalAssociations' is the number of direct associations ("edges") for a particular transcription factor. 'pmid' = PubMed identifier referencing where the data was obtained

  4. S

    A dataset of knowledge graph construction for patents, sci-tech achievements...

    • scidb.cn
    Updated Oct 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hu hui ling; Zhai Jun; Li Mei; Li Xin; Shen Lixin (2025). A dataset of knowledge graph construction for patents, sci-tech achievements and papers in agriculture, industry and service industry [Dataset]. http://doi.org/10.57760/sciencedb.j00001.01576
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    Science Data Bank
    Authors
    hu hui ling; Zhai Jun; Li Mei; Li Xin; Shen Lixin
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    As important carriers of innovation activities, patents, sci-tech achievements and papers play an increasingly prominent role in national political and economic development under the background of a new round of technological revolution and industrial transformation. However, in a distributed and heterogeneous environment, the integration and systematic description of patents, sci-tech achievements and papers data are still insufficient, which limits the in-depth analysis and utilization of related data resources. The dataset of knowledge graph construction for patents, sci-tech achievements and papers is an important means to promote innovation network research, and is of great significance for strengthening the development, utilization, and knowledge mining of innovation data. This work collected data on patents, sci-tech achievements and papers from China's authoritative websites spanning the three major industries—agriculture, industry, and services—during the period 2022-2025. After processes of cleaning, organizing, and normalization, a patents-sci-tech achievements-papers knowledge graph dataset was formed, containing 10 entity types and 8 types of entity relationships. To ensure quality and accuracy of data, the entire process involved strict preprocessing, semantic extraction and verification, with the ontology model introduced as the schema layer of the knowledge graph. The dataset establishes direct correlations among patents, sci-tech achievements and papers through inventors/contributors/authors, and utilizes the Neo4j graph database for storage and visualization. The open dataset constructed in this study can serve as important foundational data for building knowledge graphs in the field of innovation, providing structured data support for innovation activity analysis, scientific research collaboration network analysis and knowledge discovery.The dataset consists of two parts. The first part includes three Excel tables: 1,794 patent records with 10 fields, 181 paper records with 7 fields, and 1,156 scientific and technological achievement records with 11 fields. The second part is a knowledge graph dataset in CSV format that can be imported into Neo4j, comprising 10 entity files and 8 relationship files.

  5. NeoModeling Framework: Leveraging Graph-Based Persistence for Large-Scale...

    • zenodo.org
    zip
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luciano Marchezan; Luciano Marchezan; Nikitchyn Vitalii; Eugene Syriani; Eugene Syriani; Nikitchyn Vitalii (2025). NeoModeling Framework: Leveraging Graph-Based Persistence for Large-Scale Model-Driven Engineering (replication package) [Dataset]. http://doi.org/10.5281/zenodo.17238878
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Luciano Marchezan; Luciano Marchezan; Nikitchyn Vitalii; Eugene Syriani; Eugene Syriani; Nikitchyn Vitalii
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the replication package for the paper "NeoModeling Framework: Leveraging Graph-Based Persistence for Large-Scale Model-Driven Engineering" where we present Neo Modeling Framework (NMF), an open-source set of tools primarily designed to manipulate ultra-large datasets in the Neo4j database.

    Repository structure

    • NeoModelingFramework.zip - contains the replication package, including the source code for NMF, test files to run the evaluation, used artifacts, and instructions to run the framework. The most import folders are listed below:
      • codeGenerator - NMF generator module
      • modelLoader - NMF loader module
      • modelEditor - NMF editor module
      • Evaluation - contains the evaluation artifacts and results (a copy
        • metamodels - Ecore files used for RQ1 and RQ2
        • results - CSV files with the results from RQ1, RQ2 and RQ3
        • analysis - Jupyter notebooks used to analyze and plot the results

    Running NMF

    The best way to run NMF is following the instructions at our GitHub repository. A copy of the Readme file is also present inside the zip file available here.

    Empirical Evaluation

    Make sure that you follow the instructions to run NMF.

    The quantitative evaluation can be re-run by running RQ1Eval.kt, RQ2Eval.kt inside modelLoader/src/test/kotlin/evaluation and RQ2Eval.kt inside modelEditor/src/test/kotlin/evaluation.

    Make sure that you have an empty instance of Neo4j running.


    Results will be generated as CSV files, under Evaluation/results and the results can be plotted by running the Jupyter Notebooks at Evaluation/analysis.

    Please note that due to differences in hardware, re-running the experiments will probably generate slightly different results than those reported in the paper.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aleksei Golovin (2023). TMF Business Process Framework Dataset for Neo4j [Dataset]. https://www.kaggle.com/datasets/algord/tmf-business-process-framework-dataset-for-neo4j
Organization logo

TMF Business Process Framework Dataset for Neo4j

Neo4j knowledge graph based on TMF Business Process Framework data

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
zip(13261206 bytes)Available download formats
Dataset updated
Dec 4, 2023
Authors
Aleksei Golovin
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

TMF Business Process Framework Dataset for Neo4j

The dataset is a Neo4j knowledge graph based on TMF Business Process Framework v22.0 data.
CSV files contain data about the model entities, and the JSON file contains knowledge graph mapping.
The script used to generate CSV files based on the XML model can be found here.

To import the dataset, download the zip archive and upload it to Neo4j.

You also can check this dataset here.

Search
Clear search
Close search
Google apps
Main menu