Facebook
TwitterRajpreet2206/nl2sql-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is designed for training models to convert natural language prompts into SQL queries, specifically focusing on SELECT statements. The dataset comprises 14,815 examples where each prompt is associated with the corresponding SQL query that would retrieve the desired information from a specific table.
Columns: Prompt: The natural language text representing a query request. SQL Query: The corresponding SQL query generated to fulfill the request.
Facebook
TwitterShritama/nl2sql dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by KingPawnUSA
Released under CC0: Public Domain
Facebook
TwitterNormalMatt/nl2sql dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NL2SQL for BI dataset
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is accompanying our current submission to MSR 2024 data and tool showcase track submission
Facebook
TwitterThis dataset was created by Himanshu Nayal
Facebook
TwitterManoharPalanisamy/NL2SQL dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NL2SQL Deduplicated Training Dataset
A curated and deduplicated Text-to-SQL training dataset with 683,015 unique examples from 4 high-quality sources.
📊 Dataset Summary
Total Examples: 683,015 unique question-SQL pairs Sources: Spider, SQaLe, Gretel Synthetic, SQL-Create-Context Deduplication Strategy: Input-only (question-based) with conflict resolution via quality priority Conflicts Resolved: 2,238 cases where same question had different SQL SQL Dialect: Standard SQL… See the full description on the dataset page: https://huggingface.co/datasets/AsadIsmail/nl2sql-deduplicated.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
simone-papicchio/nl2sql-reasoning-trace dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThis dataset was created by Zuo Zhaorui
Facebook
Twittersirabhop/nl2sql_food_fieldname dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Peixian Ma1,2
Xialie Zhuang1,3
Chengjin Xu1,4
Xuhui Jiang1,4
Ran Chen1
Jian Guo1
1IDEA Research, International Digital Economy Academy
2The Hong Kong University of Science and Technology (Guangzhou)
3University of Chinese Academy of Science
4DataArc Tech Ltd.
📖 Overview
Natural Language to SQL (NL2SQL) enables intuitive interactions… See the full description on the dataset page: https://huggingface.co/datasets/MPX0222forHF/SynSQL-Complex-5K.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Card for Spider
Dataset Summary
Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.
Supported Tasks and Leaderboards
The leaderboard can be seen at https://yale-lily.github.io/spider
Languages
The text in the dataset is in English.
Dataset Structure
Data… See the full description on the dataset page: https://huggingface.co/datasets/xlangai/spider.
Facebook
Twitter整合了3个中文数据集:追一科技NL2SQL,西湖大学的CSpider中文翻译,百度的DuSQL。 进行了大致的清洗,以及格式转换(alpaca): 假设你是一个数据库SQL专家,下面我会给出一个MySQL数据库的信息,请根据问题,帮我生成相应的SQL语句。当前时间为2023年。格式如下:{'sql':sql语句} MySQL数据库数据库结构如下: {表名(字段名...)} 其中: {表之间的主外键关联关系} 对于query:“{问题}”,给出相应的SQL语句,按照要求的格式返回,不进行任何解释。 其中,DuSQL最终结果是25004个。NL2SQL最终结果45919个,注意表名是乱码。CSpider,最终结果7786条,注意数据库是英文的,问题是中文的。 最终形成的文件,一共78706条,文件样例: { "instruction": "假设你是一个数据库SQL专家,下面我会给出一个MySQL数据库的信息,请根据问题,帮我生成相应的SQL语句。当前时间为2023年。", "input":… See the full description on the dataset page: https://huggingface.co/datasets/lorinma/NL2SQL_zh.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterRajpreet2206/nl2sql-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community