Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Synthetic dataset created with GPT-4o
Synthetic dataset of text2cypher over 16 different graph schemas. Questions were generated using GPT-4-turbo, and the corresponding Cypher statements with gpt-4o using Chain of Thought. Here, there are only questions that return results when queried against the database. For more information visit: https://github.com/neo4j-labs/text2cypher/tree/main/datasets/synthetic_gpt4o_demodbs Dataset is available as train.csv. Columns are the following:… See the full description on the dataset page: https://huggingface.co/datasets/tomasonjo/text2cypher-gpt4o-clean.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Synthetic dataset created with GPT-4-Turbo
Synthetic dataset of text2cypher over 16 different graph schemas. Both questions and cypher queries were generated using GPT-4-turbo. The demo database is available at: URI: neo4j+s://demo.neo4jlabs.com username: name of the database, for example 'movies' password: name of the database, for example 'movies' database: name of the database, for example 'movies'
Notebooks:
generate_text2cypher_questions.ipynb: Generate questions and prepare… See the full description on the dataset page: https://huggingface.co/datasets/tomasonjo/synthetic-text2cypher-gpt4turbo.
Dataset Card for "text2cypher-recommendations-ha-sample"
This is human annotated sample data for Text2Cypher generation on recommendations (Movie) database.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
evagelnjy/text2cypher dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for "text2cypher-recommendations-gpt4o-sft-0.5k"
More Information needed
persistent/text2cypher-recommendations-test-sample dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
neo4j/text2cypher-2025v1 dataset hosted on Hugging Face and contributed by the HF Datasets community
persistent/text2cypher-sft-0.1k dataset hosted on Hugging Face and contributed by the HF Datasets community
persistent/text2cypher-recommendations-sft-0.375k dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
maxromanovsky/neo4j-text2cypher-2024v1 dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Gurveer05/text2cypher-small dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Translated Text2Cypher'24 Training Set - Sampled & Multilingual
This dataset provides a sampled and translated training set based on the Neo4j Text2Cypher '24 dataset. It is designed to support research on multilingual natural language to Cypher query generation. We offer two versions of the training set:
1. Multilingual Version (multilang)
Total examples: ~36,000
Languages: English (en), Spanish (es), Turkish (tr)
Samples per language: ~12,000
Translation… See the full description on the dataset page: https://huggingface.co/datasets/mgoNeo4j/translated_text2cypher24_trainset_sampled.
chnug/text2cypher-2024v1-copy dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Translated Text2Cypher'24 Test Set
This dataset provides Spanish (es) and Turkish (tr) translations of the test split of the Neo4jText2Cypher'24 dataset.
Overview
Only the question field (user's natural language input) is translated. Original questions were in English (en), and translated versions are available in Spanish(es) and Turkish (tr). All questions across languages are paired with the same Cypher query for consistent evaluation.
Usage Example
from… See the full description on the dataset page: https://huggingface.co/datasets/mgoNeo4j/translated_text2cypher24_testset.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
SynthCypher Dataset Repository
Overview
This repository hosts SynthCypher, a novel synthetic dataset designed to bridge the gap in Text-to-Cypher (Text2Cypher) tasks. SynthCypher leverages state-of-the-art large language models (LLMs) to automatically generate and validate high-quality data for training and evaluating models that convert natural language questions into Cypher queries for graph databases like Neo4j. Our dataset and pipeline contribute significantly to… See the full description on the dataset page: https://huggingface.co/datasets/ServiceNow-AI/SynthCypher.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Synthetic dataset created with GPT-4o
Synthetic dataset of text2cypher over 16 different graph schemas. Questions were generated using GPT-4-turbo, and the corresponding Cypher statements with gpt-4o using Chain of Thought. Here, there are only questions that return results when queried against the database. For more information visit: https://github.com/neo4j-labs/text2cypher/tree/main/datasets/synthetic_gpt4o_demodbs Dataset is available as train.csv. Columns are the following:… See the full description on the dataset page: https://huggingface.co/datasets/tomasonjo/text2cypher-gpt4o-clean.