10 datasets found
  1. h

    nl2sql

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tae-Hyoung Choi, nl2sql [Dataset]. https://huggingface.co/datasets/selmoch/nl2sql
    Explore at:
    Authors
    Tae-Hyoung Choi
    Description

    selmoch/nl2sql dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. f

    NL2SQL for BI Dataset

    • figshare.com
    zip
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bora Caglayan (2023). NL2SQL for BI Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24771738.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    figshare
    Authors
    Bora Caglayan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NL2SQL for BI dataset

  3. h

    nl2sql-dataset

    • huggingface.co
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Armaghan Shakir (2025). nl2sql-dataset [Dataset]. https://huggingface.co/datasets/geetu040/nl2sql-dataset
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    Armaghan Shakir
    Description

    geetu040/nl2sql-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    nl2sql-500k

    • huggingface.co
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lie (2025). nl2sql-500k [Dataset]. https://huggingface.co/datasets/Lie24/nl2sql-500k
    Explore at:
    Dataset updated
    Apr 10, 2025
    Authors
    Lie
    Description

    Lie24/nl2sql-500k dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    NL2SQL-Queries

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haozhe Wang, NL2SQL-Queries [Dataset]. https://huggingface.co/datasets/JasperHaozhe/NL2SQL-Queries
    Explore at:
    Authors
    Haozhe Wang
    Description

    JasperHaozhe/NL2SQL-Queries dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. P

    BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)...

    • paperswithcode.com
    Updated Sep 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jinyang Li; Binyuan Hui; Ge Qu; Jiaxi Yang; Binhua Li; Bowen Li; Bailin Wang; Bowen Qin; Rongyu Cao; Ruiying Geng; Nan Huo; Xuanhe Zhou; Chenhao Ma; Guoliang Li; Kevin C. C. Chang; Fei Huang; Reynold Cheng; Yongbin Li (2024). BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) Dataset [Dataset]. https://paperswithcode.com/dataset/bird-sql
    Explore at:
    Dataset updated
    Sep 24, 2024
    Authors
    Jinyang Li; Binyuan Hui; Ge Qu; Jiaxi Yang; Binhua Li; Bowen Li; Bailin Wang; Bowen Qin; Rongyu Cao; Ruiying Geng; Nan Huo; Xuanhe Zhou; Chenhao Ma; Guoliang Li; Kevin C. C. Chang; Fei Huang; Reynold Cheng; Yongbin Li
    Description

    BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing. BIRD contains over 12,751 unique question-SQL pairs and 95 big databases with a total size of 33.4 GB. It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and education, etc.

  7. P

    SEDE Dataset

    • paperswithcode.com
    Updated Aug 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moshe Hazoom; Vibhor Malik; Ben Bogin (2024). SEDE Dataset [Dataset]. https://paperswithcode.com/dataset/sede
    Explore at:
    Dataset updated
    Aug 8, 2024
    Authors
    Moshe Hazoom; Vibhor Malik; Ben Bogin
    Description

    SEDE is a dataset comprised of 12,023 complex and diverse SQL queries and their natural language titles and descriptions, written by real users of the Stack Exchange Data Explorer out of a natural interaction. These pairs contain a variety of real-world challenges which were rarely reflected so far in any other semantic parsing dataset. The goal of this dataset is to take a significant step towards evaluation of Text-to-SQL models in a real-world setting. Compared to other Text-to-SQL datasets, SEDE contains at least 10 times more SQL queries templates (queries after canonization and anonymization of values) than other datasets, and has the most diverse set of utterances and SQL queries (in terms of 3-grams) out of all single-domain datasets. SEDE introduces real-world challenges, such as under-specification, usage of parameters in queries, dates manipulation and more.

  8. h

    spider

    • huggingface.co
    • opendatalab.com
    Updated Dec 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XLang NLP Lab (2021). spider [Dataset]. https://huggingface.co/datasets/xlangai/spider
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset authored and provided by
    XLang NLP Lab
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for Spider

      Dataset Summary
    

    Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.

      Supported Tasks and Leaderboards
    

    The leaderboard can be seen at https://yale-lily.github.io/spider

      Languages
    

    The text in the dataset is in English.

      Dataset Structure
    
    
    
    
    
      Data… See the full description on the dataset page: https://huggingface.co/datasets/xlangai/spider.
    
  9. h

    SynSQL-Complex-5K

    • huggingface.co
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peixian Ma (2025). SynSQL-Complex-5K [Dataset]. https://huggingface.co/datasets/MPX0222forHF/SynSQL-Complex-5K
    Explore at:
    Dataset updated
    Jun 15, 2025
    Authors
    Peixian Ma
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

    Peixian Ma1,2 
    Xialie Zhuang1,3 
    Chengjin Xu1,4 
    Xuhui Jiang1,4 
    Ran Chen1 
    Jian Guo1
    
    
    
    1IDEA Research, International Digital Economy Academy
    2The Hong Kong University of Science and Technology (Guangzhou)
    3University of Chinese Academy of Science
    4DataArc Tech Ltd.
    
    
    
    
    
    
    
    
    
    
    
    
    
      📖 Overview
    

    Natural Language to SQL (NL2SQL) enables intuitive interactions… See the full description on the dataset page: https://huggingface.co/datasets/MPX0222forHF/SynSQL-Complex-5K.

  10. h

    NSText2SQL

    • huggingface.co
    • opendatalab.com
    Updated Feb 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NumbersStation (2024). NSText2SQL [Dataset]. https://huggingface.co/datasets/NumbersStation/NSText2SQL
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 23, 2024
    Dataset authored and provided by
    NumbersStation
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Summary

    NSText2SQL dataset used to train NSQL models. The data is curated from more than 20 different public sources across the web with permissable licenses (listed below). All of these datasets come with existing text-to-SQL pairs. We apply various data cleaning and pre-processing techniques including table schema augmentation, SQL cleaning, and instruction generation using existing LLMs. The resulting dataset contains around 290,000 samples of text-to-SQL pairs. For more… See the full description on the dataset page: https://huggingface.co/datasets/NumbersStation/NSText2SQL.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tae-Hyoung Choi, nl2sql [Dataset]. https://huggingface.co/datasets/selmoch/nl2sql

nl2sql

selmoch/nl2sql

Explore at:
Authors
Tae-Hyoung Choi
Description

selmoch/nl2sql dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu