9 datasets found
  1. Neo4j open measurment

    • kaggle.com
    zip
    Updated Feb 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tom Nijhof-Verhees (2023). Neo4j open measurment [Dataset]. https://www.kaggle.com/datasets/wagenrace/neo4j-open-measurment
    Explore at:
    zip(29854808766 bytes)Available download formats
    Dataset updated
    Feb 15, 2023
    Authors
    Tom Nijhof-Verhees
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Kickstart a chemical graph database

    I have spent some time scrapping and shaping PubChem data into a Neo4j graph database. The process took a lot of time, mainly downloading, and loading it into Neo4j. The whole process took weeks. If you want to build your own I will show you how to download mine and set it up in less than an hour (most of the time you’ll just have to wait). The process of how this dataset is created is described in the following blogs: - https://medium.com/@nijhof.dns/exploring-neodash-for-197m-chemical-full-text-graph-e3baed9615b8 - https://medium.com/neo4j/combining-3-biochemical-datasets-in-a-graph-database-8e9aafbb5788 - https://medium.com/p/d9ee9779dfbe

    What do you get?

    The full database is a merge of 3 datasets, PubChem (compounds + synonyms), NCI60 (GI50), and ChEMBL (cell lines). It contains 6 nodes of interest: ● Compound: This is related to a compound of PubChem. It has 1 property. ○ pubChemCompId: The id within pubchem. So “compound:cid162366967” links to https://pubchem.ncbi.nlm.nih.gov/compound/162366967. This number can be used with both PubChem RDF and PUG. ● Synonym: A name found in the literature. This name can refer to zero, one, or more compounds. This helps find relations between natural language names and absolute compounds they are related to. ○ Name: Natural language name. Can contain letters, spaces, numbers, and any other Unicode character. ○ pubChemSynId: PubChem synonym id as used within the RDF ● CellLine: These are the ChEMBL cell lines. They hold a lot of information. ○ Name: The name of the cell line. ○ Uri: A unique URI for every element within the ChEMBL RDF. ○ cellosaurusId: The id to connect it to the Cellosaurus dataset. This is one of the most extensive cell line datasets out there. ● Measurement: A measurement you can do within a biomedical experiment. Currently, only GI50 (the concentration needed for Growth Inhibition of 50%) is added. ○ Name: Name of the measurement. ● Condition: A single condition of an experiment. A condition is part of an experiment. Examples are: an individual of the control group, a sample with drug A, or a sample with more CO2 ● Experiment: A collection of multiple conditions all done at the same time with the same bias. Meaning we assume all uncontrolled variables are the same. ○ Name: Name of experiment.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F442733%2F7dd804811e105390dfe20bb5cd1a68c0%2FUntitled%20graph.png?generation=1680113457794452&alt=media" alt="">

    Overview of the graph design

    How do download it Warning, you need 120 GB of free memory. The compressed file you download is already 30 GB. The uncompressed file is 30 GB. The database afterward is 60 GB. 60 GB is only for temporary files, the other 60 is for the database. If you do this on an HDD hard disk it will be slow.

    If you load this into Neo4j desktop as a local database (like I do) it will scream and yell at you, just ignore this. We are pushing it far further than it is designed for, but it will still work.

    Download the file

    Go to this Kaggle dataset and download the dump file. Unzip the file, then delete the zipped file. This part needs 60 GB but only takes 30 by the end of it. Create a database Open the Neo4j desktop app, and click “Reveal files in File Explorer”. Move the .dump you downloaded into this folder.

    Click on the ... behind the .dump file and click Create new DBMS from dump. This database is a dump from Neo4j V4, so your database also needs to be V4.x.x!

    It will now create the database. This will take a long time, it might even say it has timed out. Do not believe this lie! In the background, it is still running. Every time you start it, it will time out. Just let it run and press start later again. The second time it will be started up directly.

    Every time I start it up I get the timed-out error. After waiting 10 minutes and clicking start again the database, and with it, more than 200 million nodes, is ready. And you are done! Good luck and let me know what you build with it

  2. Z

    Dataset used for "A Recommender System of Buggy App Checkers for App Store...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jun 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Gomez; Romain Rouvoy; Martin Monperrus; Lionel Seinturier (2021). Dataset used for "A Recommender System of Buggy App Checkers for App Store Moderators" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5034291
    Explore at:
    Dataset updated
    Jun 28, 2021
    Dataset provided by
    University of Lille / Inria
    Authors
    Maria Gomez; Romain Rouvoy; Martin Monperrus; Lionel Seinturier
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset used for paper: "A Recommender System of Buggy App Checkers for App Store Moderators", published on the International Conference on Mobile Software Engineering and Systems (MOBILESoft) in 2015.

    Dataset Collection We built a dataset that consists of a random sample of Android app metadata and user reviews available on the Google Play Store on January and March 2014. Since the Google Play Store is continuously evolving (adding, removing and/or updating apps), we updated the dataset twice. The dataset D1 contains available apps in the Google Play Store in January 2014. Then, we created a new snapshot (D2) of the Google Play Store in March 2014.

    The apps belong to the 27 different categories defined by Google (at the time of writing the paper), and the 4 predefined subcategories (free, paid, new_free, and new_paid). For each category-subcategory pair (e.g. tools-free, tools-paid, sports-new_free, etc.), we collected a maximum of 500 samples, resulting in a median number of 1.978 apps per category.

    For each app, we retrieved the following metadata: name, package, creator, version code, version name, number of downloads, size, upload date, star rating, star counting, and the set of permission requests.

    In addition, for each app, we collected up to a maximum of the latest 500 reviews posted by users in the Google Play Store. For each review, we retrieved its metadata: title, description, device, and version of the app. None of these fields were mandatory, thus several reviews lack some of these details. From all the reviews attached to an app, we only considered the reviews associated with the latest version of the app —i.e., we discarded unversioned and old-versioned reviews. Thus, resulting in a corpus of 1,402,717 reviews (2014 Jan.).

    Dataset Stats Some stats about the datasets:

    • D1 (Jan. 2014) contains 38,781 apps requesting 7,826 different permissions, and 1,402,717 user reviews.

    • D2 (Mar. 2014) contains 46,644 apps and 9,319 different permission requests, and 1,361,319 user reviews.

    Additional stats about the datasets are available here.

    Dataset Description To store the dataset, we created a graph database with Neo4j. This dataset therefore consists of a graph describing the apps as nodes and edges. We chose a graph database because the graph visualization helps to identify connections among data (e.g., clusters of apps sharing similar sets of permission requests).

    In particular, our dataset graph contains six types of nodes: - APP nodes containing metadata of each app, - PERMISSION nodes describing permission types, - CATEGORY nodes describing app categories, - SUBCATEGORY nodes describing app subcategories, - USER_REVIEW nodes storing user reviews. - TOPIC topics mined from user reviews (using LDA).

    Furthermore, there are five types of relationships between APP nodes and each of the remaining nodes:

    • USES_PERMISSION relationships between APP and PERMISSION nodes
    • HAS_REVIEW between APP and USER_REVIEW nodes
    • HAS_TOPIC between USER_REVIEW and TOPIC nodes
    • BELONGS_TO_CATEGORY between APP and CATEGORY nodes
    • BELONGS_TO_SUBCATEGORY between APP and SUBCATEGORY nodes

    Dataset Files Info

    Neo4j 2.0 Databases

    googlePlayDB1-Jan2014_neo4j_2_0.rar

    googlePlayDB2-Mar2014_neo4j_2_0.rar We provide two Neo4j databases containing the 2 snapshots of the Google Play Store (January and March 2014). These are the original databases created for the paper. The databases were created with Neo4j 2.0. In particular with the tool version 'Neo4j 2.0.0-M06 Community Edition' (latest version available at the time of implementing the paper in 2014).

    Neo4j 3.5 Databases

    googlePlayDB1-Jan2014_neo4j_3_5_28.rar

    googlePlayDB2-Mar2014_neo4j_3_5_28.rar Currently, the version Neo4j 2.0 is deprecated and it is not available for download in the official Neo4j Download Center. We have migrated the original databases (Neo4j 2.0) to Neo4j 3.5.28. The databases can be opened with the tool version: 'Neo4j Community Edition 3.5.28'. The tool can be downloaded from the official Neo4j Donwload page.

      In order to open the databases with more recent versions of Neo4j, the databases must be first migrated to the corresponding version. Instructions about the migration process can be found in the Neo4j Migration Guide.
    
      First time the Neo4j database is connected, it could request credentials. The username and pasword are: neo4j/neo4j
    
  3. Twitter Graph Example v2 43

    • kaggle.com
    zip
    Updated Jun 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathias Weiß (2022). Twitter Graph Example v2 43 [Dataset]. https://www.kaggle.com/datasets/weissmedia/twitter-graph-example-v2-43
    Explore at:
    zip(17943518 bytes)Available download formats
    Dataset updated
    Jun 29, 2022
    Authors
    Mathias Weiß
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This project is inspired on https://github.com/neo4j-graph-examples/twitter-v2.

    Twitter Graph

    Show data from your personal Twitter account

    The Graph Your Network application inserts your Twitter activity into Neo4j.

    https://neo4jsandbox.com/guides/twitter/img/twitter-data-model.svg" alt="">

    Content

    ~10 MB of graphs data (CSV)

    43.325 node labels - Hashtag - Link - Me - Source - Tweet - User

    57.896 relationship types - AMPLIFIES - CONTAINS - FOLLOWS - INTERACTS_WITH - MENTIONS - POSTS - REPLY_TO - RETWEETS - RT_MENTIONS - SIMILAR_TO - TAGS - USING

  4. Z

    Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and...

    • data.niaid.nih.gov
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sadat, Mefta; Bener, Ayse Basar; Miranskyy, Andriy V. (2024). Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and KDE [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_400614
    Explore at:
    Dataset updated
    Aug 3, 2024
    Dataset provided by
    Ryerson University
    Authors
    Sadat, Mefta; Bener, Ayse Basar; Miranskyy, Andriy V.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present three defect rediscovery datasets mined from Bugzilla. The datasets capture data for three groups of open source software projects: Apache, Eclipse, and KDE. The datasets contain information about approximately 914 thousands of defect reports over a period of 18 years (1999-2017) to capture the inter-relationships among duplicate defects.

    File Descriptions

    apache.csv - Apache Defect Rediscovery dataset

    eclipse.csv - Eclipse Defect Rediscovery dataset

    kde.csv - KDE Defect Rediscovery dataset

    apache.relations.csv - Inter-relations of rediscovered defects of Apache

    eclipse.relations.csv - Inter-relations of rediscovered defects of Eclipse

    kde.relations.csv - Inter-relations of rediscovered defects of KDE

    create_and_populate_neo4j_objects.cypher - Populates Neo4j graphDB by importing all the data from the CSV files. Note that you have to set dbms.import.csv.legacy_quote_escaping configuration setting to false to load the CSV files as per https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/#config_dbms.import.csv.legacy_quote_escaping

    create_and_populate_mysql_objects.sql - Populates MySQL RDBMS by importing all the data from the CSV files

    rediscovery_db_mysql.zip - For your convenience, we also provide full backup of the MySQL database

    neo4j_examples.txt - Sample Neo4j queries

    mysql_examples.txt - Sample MySQL queries

    rediscovery_eclipse_6325.png - Output of Neo4j example #1

    distinct_attrs.csv - Distinct values of bug_status, resolution, priority, severity for each project

  5. CIS Graph Database and Model

    • figshare.com
    pdf
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanislava Gardasevic (2023). CIS Graph Database and Model [Dataset]. http://doi.org/10.6084/m9.figshare.21663401.v4
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Stanislava Gardasevic
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is based on the model developed with the Ph.D. students of the Communication and Information Sciences Ph.D. program at the University of Hawaii at Manoa, intended to help new students get relevant information. The model was first presented at the iConference 2023, in a paper "Community Design of a Knowledge Graph to Support Interdisciplinary Ph.D. Students " by Stanislava Gardasevic and Rich Gazan (available at: https://scholarspace.manoa.hawaii.edu/server/api/core/bitstreams/9eebcea7-06fd-4db3-b420-347883e6379e/content)The database is created in Neo4J, and the .dump file can be imported to the cloud instance of this software. The dataset (.dump) contains publically available data collected from multiple web locations and indexes of the sample of publications from the people in this domain. Except for that, it contains my (first author's) personal graph demonstrating progress through a student's program in this degree, and activities they have done while in the program. This dataset was made possible with the huge help of my collaborator, Petar Popovic, who ingested the data in the database.The model and dataset were developed while involving the end users in the design and are based on the actual information needs of a population. It is intended to allow researchers to investigate multigraph visualization of the data modeled by the said model.The knowledge graph was evaluated with CIS student population, and the study results show that it is very helpful for decision-making, information discovery, and identification of people in one's surroundings who might be good collaborators or information points. We provide the .json file containing the Neo4J Bloom perspective with styling and queries used in these evaluation sessions.

  6. G

    Managed Neo4j Services Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Managed Neo4j Services Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/managed-neo4j-services-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Aug 23, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Managed Neo4j Services Market Outlook



    According to our latest research, the global managed Neo4j services market size reached USD 423 million in 2024, reflecting robust demand for graph database solutions across diverse industries. The market is projected to expand at a CAGR of 20.1% from 2025 to 2033, reaching a forecasted value of USD 2.23 billion by 2033. This remarkable growth trajectory is driven by the increasing adoption of connected data analytics, rising digital transformation initiatives, and the need for scalable, flexible, and managed database solutions across enterprises worldwide.




    One of the primary growth factors fueling the managed Neo4j services market is the exponential rise in data complexity and interconnectedness within enterprise environments. Organizations are increasingly recognizing the limitations of traditional relational databases in handling highly connected data, such as social networks, fraud detection, recommendation engines, and supply chain management. Managed Neo4j services, leveraging the power of graph databases, enable businesses to model, store, and analyze complex relationships efficiently. The growing need for real-time insights, enhanced customer experiences, and advanced analytics capabilities is pushing enterprises to adopt managed Neo4j solutions, as these services offer seamless integration, scalability, and expert support for mission-critical applications.




    Another significant driver for the managed Neo4j services market is the widespread shift towards cloud-based and hybrid IT infrastructures. As organizations migrate their workloads to the cloud, managed services become essential for ensuring optimal performance, security, and cost-effectiveness. Managed Neo4j providers offer end-to-end solutions, including consulting, implementation, support, and training, which alleviate the burden on internal IT teams and accelerate time-to-value. The increasing prevalence of multi-cloud strategies, combined with the need for high availability and disaster recovery, further enhances the appeal of managed Neo4j services. Enterprises are also prioritizing compliance and data governance, and managed service providers are well-positioned to deliver solutions that meet regulatory requirements while enabling innovation.




    The managed Neo4j services market is also benefiting from the surge in artificial intelligence, machine learning, and big data analytics initiatives across industries. Graph databases like Neo4j are uniquely suited to support advanced analytics use cases, such as knowledge graphs, identity and access management, and network analysis. As organizations seek to unlock the value of their data assets, managed Neo4j services provide the expertise, tools, and ongoing support needed to deploy and scale graph-based applications. The rise of digital ecosystems, IoT integration, and API-driven architectures is further expanding the addressable market for managed Neo4j services, as enterprises aim to stay competitive in a rapidly evolving digital landscape.




    From a regional perspective, North America continues to dominate the managed Neo4j services market, accounting for the largest share in 2024, driven by early technology adoption, a mature IT services sector, and strong investments in data-driven initiatives. However, Asia Pacific is emerging as the fastest-growing region, with a projected CAGR exceeding 24% during the forecast period, fueled by rapid digitalization, expanding cloud adoption, and government-led innovation programs. Europe, Latin America, and the Middle East & Africa are also witnessing increased demand for managed Neo4j solutions, as enterprises across these regions embrace graph databases to enhance operational efficiency, customer engagement, and compliance.





    Service Type Analysis



    The managed Neo4j services market is segmented by service type into consulting, implementation, support & maintenance, and training. Consulting services represent a critical entry point for organizations embarking on their

  7. f

    DataSheet1_Threat modelling in Internet of Things (IoT) environments using...

    • frontiersin.figshare.com
    zip
    Updated May 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marwa Salayma (2024). DataSheet1_Threat modelling in Internet of Things (IoT) environments using dynamic attack graphs.ZIP [Dataset]. http://doi.org/10.3389/friot.2024.1306465.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2024
    Dataset provided by
    Frontiers
    Authors
    Marwa Salayma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This work presents a threat modelling approach to represent changes to the attack paths through an Internet of Things (IoT) environment when the environment changes dynamically, that is, when new devices are added or removed from the system or when whole sub-systems join or leave. The proposed approach investigates the propagation of threats using attack graphs, a popular attack modelling method. However, traditional attack-graph approaches have been applied in static environments that do not continuously change, such as enterprise networks, leading to static and usually very large attack graphs. In contrast, IoT environments are often characterised by dynamic change and interconnections; different topologies for different systems may interconnect with each other dynamically and outside the operator’s control. Such new interconnections lead to changes in the reachability amongst devices according to which their corresponding attack graphs change. This requires dynamic topology and attack graphs for threat and risk analysis. This article introduces an example scenario based on healthcare systems to motivate the work and illustrate the proposed approach. The proposed approach is implemented using a graph database management tool (GDBM), Neo4j, which is a popular tool for mapping, visualising, and querying the graphs of highly connected data. It is efficient in providing a rapid threat modelling mechanism, making it suitable for capturing security changes in the dynamic IoT environment. Our results show that our developed threat modelling approach copes with dynamic system changes that may occur in IoT environments and enables identifying attack paths, whilst allowing for system dynamics. The developed dynamic topology and attack graphs can cope with the changes in the IoT environment efficiently and rapidly by maintaining their associated graphs.

  8. GPU Database Market by Deployment and Geography - Forecast and Analysis...

    • technavio.com
    pdf
    Updated Oct 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2021). GPU Database Market by Deployment and Geography - Forecast and Analysis 2021-2025 [Dataset]. https://www.technavio.com/report/gpu-database-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Oct 19, 2021
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2020 - 2025
    Description

    Snapshot img

    The GPU database market share should rise by USD 361.56 million from 2021 to 2025 at a CAGR of 17.82%.

    This GPU database market research report provides valuable insights on the post COVID-19 impact on the market, which will help companies evaluate their business approaches. Furthermore, this report extensively covers market segmentation by deployment (on-premise and cloud) and geography (North America, Europe, APAC, South America, and MEA). The GPU database market report also offers information on several market vendors, including BlazingSQL Inc., Brytlyt Ltd., Hetero DB Co. Ltd., Jedox GmbH, Kinetica DB Inc., Neo4j Inc., NVIDIA Corp., OmniSci Inc., SQream Technologies Ltd., and Zilliz among others.

    What will the GPU Database Market Size be in 2021?

    To Unlock the GPU Database Market Size for 2021 and Other Important Statistics, Download the Free Report Sample!

    GPU Database Market: Key Drivers and Trends

    The massive data generation across various industries supporting the adoption of GPU accelerated tools is notably driving the GPU database market growth, although factors such as unavailability of enough technical expertise and domain knowledge may impede market growth. Our research analysts have studied the historical data and deduced the key market drivers and the COVID-19 pandemic impact on the GPU database industry. The holistic analysis of the drivers will help in predicting end goals and refining marketing strategies to gain a competitive edge.

    This GPU database market analysis report also provides detailed information on other upcoming trends and challenges that will have a far-reaching effect on the market growth. The actionable insights on the trends and challenges will help companies evaluate and develop growth strategies for 2021-2025.

    Who are the Major GPU Database Market Vendors?

    The report analyzes the market’s competitive landscape and offers information on several market vendors, including:

    BlazingSQL Inc.
    Brytlyt Ltd.
    Hetero DB Co. Ltd.
    Jedox GmbH
    Kinetica DB Inc.
    Neo4j Inc.
    NVIDIA Corp.
    OmniSci Inc.
    SQream Technologies Ltd.
    Zilliz
    

    The vendor landscape of the GPU database market entails successful business strategies deployed by the vendors. The GPU database market is fragmented and the vendors are deploying various organic and inorganic growth strategies to compete in the market.

    To make the most of the opportunities and recover from post COVID-19 impact, market vendors should focus more on the growth prospects in the fast-growing segments, while maintaining their positions in the slow-growing segments.

    Download a free sample of the GPU database market forecast report for insights on complete key vendor profiles. The profiles include information on the production, sustainability, and prospects of the leading companies.

    Which are the Key Regions for GPU Database Market?

    For more insights on the market share of various regions Request for a FREE sample now!

    48% of the market’s growth will originate from North America during the forecast period. The US is the key market for GPU databases in North America.

    The report offers an up-to-date analysis of the geographical composition of the market. North America has been recording a significant growth rate and is expected to offer several growth opportunities to market vendors during the forecast period. The growing demand for artificial intelligence (AI) will facilitate the GPU database market growth in North America over the forecast period. The report offers an up-to-date analysis of the geographical composition of the market, competitive intelligence, and regional opportunities in store for vendors.

    What are the Revenue-generating Deployment Segments in the GPU Database Market?

    To gain further insights on the market contribution of various segments Request for a FREE sample

    The GPU database market share growth by the on-premise segment has been significant. This report provides insights on the impact of the unprecedented outbreak of COVID-19 on market segments. Through these insights, you can safely deduce transformation patterns in consumer behavior, which is crucial to gauge segment-wise revenue growth during 2021-2025 and embrace technologies to improve business efficiency.

    This report provides an accurate prediction of the contribution of all the segments to the growth of the GPU database market size. Furthermore, our analysts have indicated actionable market insights on post COVID-19 impact on each segment, which is crucial to predict change in consumer demand.

        GPU Database Market Scope
    
    
    
    
        Report Coverage
    
    
        Details
    
    
    
    
        Page number
    
    
        120
    
    
    
    
        Base year
    
    
        2020
    
    
    
    
        Forecast period
    
    
        2021-2025
    
    
    
    
        Growth momentum & CAGR
    
    
        Accelerate at a CAGR of 17.82%
    
    
    
    
        Market growth 2021-2025
    
    
        $ 361.56 million
    
  9. h

    translated_text2cypher24_trainset_sampled

    • huggingface.co
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MGO (2025). translated_text2cypher24_trainset_sampled [Dataset]. https://huggingface.co/datasets/mgoNeo4j/translated_text2cypher24_trainset_sampled
    Explore at:
    Dataset updated
    Nov 27, 2025
    Authors
    MGO
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Translated Text2Cypher'24 Training Set - Sampled & Multilingual

    This dataset provides a sampled and translated training set based on the Neo4j Text2Cypher '24 dataset. It is designed to support research on multilingual natural language to Cypher query generation. We offer two versions of the training set:

      1. Multilingual Version (multilang)
    

    Total examples: ~36,000
    Languages: English (en), Spanish (es), Turkish (tr)
    Samples per language: ~12,000
    Translation… See the full description on the dataset page: https://huggingface.co/datasets/mgoNeo4j/translated_text2cypher24_trainset_sampled.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tom Nijhof-Verhees (2023). Neo4j open measurment [Dataset]. https://www.kaggle.com/datasets/wagenrace/neo4j-open-measurment
Organization logo

Neo4j open measurment

A graph database with 193 million synonyms

Explore at:
zip(29854808766 bytes)Available download formats
Dataset updated
Feb 15, 2023
Authors
Tom Nijhof-Verhees
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Kickstart a chemical graph database

I have spent some time scrapping and shaping PubChem data into a Neo4j graph database. The process took a lot of time, mainly downloading, and loading it into Neo4j. The whole process took weeks. If you want to build your own I will show you how to download mine and set it up in less than an hour (most of the time you’ll just have to wait). The process of how this dataset is created is described in the following blogs: - https://medium.com/@nijhof.dns/exploring-neodash-for-197m-chemical-full-text-graph-e3baed9615b8 - https://medium.com/neo4j/combining-3-biochemical-datasets-in-a-graph-database-8e9aafbb5788 - https://medium.com/p/d9ee9779dfbe

What do you get?

The full database is a merge of 3 datasets, PubChem (compounds + synonyms), NCI60 (GI50), and ChEMBL (cell lines). It contains 6 nodes of interest: ● Compound: This is related to a compound of PubChem. It has 1 property. ○ pubChemCompId: The id within pubchem. So “compound:cid162366967” links to https://pubchem.ncbi.nlm.nih.gov/compound/162366967. This number can be used with both PubChem RDF and PUG. ● Synonym: A name found in the literature. This name can refer to zero, one, or more compounds. This helps find relations between natural language names and absolute compounds they are related to. ○ Name: Natural language name. Can contain letters, spaces, numbers, and any other Unicode character. ○ pubChemSynId: PubChem synonym id as used within the RDF ● CellLine: These are the ChEMBL cell lines. They hold a lot of information. ○ Name: The name of the cell line. ○ Uri: A unique URI for every element within the ChEMBL RDF. ○ cellosaurusId: The id to connect it to the Cellosaurus dataset. This is one of the most extensive cell line datasets out there. ● Measurement: A measurement you can do within a biomedical experiment. Currently, only GI50 (the concentration needed for Growth Inhibition of 50%) is added. ○ Name: Name of the measurement. ● Condition: A single condition of an experiment. A condition is part of an experiment. Examples are: an individual of the control group, a sample with drug A, or a sample with more CO2 ● Experiment: A collection of multiple conditions all done at the same time with the same bias. Meaning we assume all uncontrolled variables are the same. ○ Name: Name of experiment.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F442733%2F7dd804811e105390dfe20bb5cd1a68c0%2FUntitled%20graph.png?generation=1680113457794452&alt=media" alt="">

Overview of the graph design

How do download it Warning, you need 120 GB of free memory. The compressed file you download is already 30 GB. The uncompressed file is 30 GB. The database afterward is 60 GB. 60 GB is only for temporary files, the other 60 is for the database. If you do this on an HDD hard disk it will be slow.

If you load this into Neo4j desktop as a local database (like I do) it will scream and yell at you, just ignore this. We are pushing it far further than it is designed for, but it will still work.

Download the file

Go to this Kaggle dataset and download the dump file. Unzip the file, then delete the zipped file. This part needs 60 GB but only takes 30 by the end of it. Create a database Open the Neo4j desktop app, and click “Reveal files in File Explorer”. Move the .dump you downloaded into this folder.

Click on the ... behind the .dump file and click Create new DBMS from dump. This database is a dump from Neo4j V4, so your database also needs to be V4.x.x!

It will now create the database. This will take a long time, it might even say it has timed out. Do not believe this lie! In the background, it is still running. Every time you start it, it will time out. Just let it run and press start later again. The second time it will be started up directly.

Every time I start it up I get the timed-out error. After waiting 10 minutes and clicking start again the database, and with it, more than 200 million nodes, is ready. And you are done! Good luck and let me know what you build with it

Search
Clear search
Close search
Google apps
Main menu