17 datasets found
  1. P

    BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)...

    • paperswithcode.com
    Updated Jan 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jinyang Li; Binyuan Hui; Ge Qu; Jiaxi Yang; Binhua Li; Bowen Li; Bailin Wang; Bowen Qin; Rongyu Cao; Ruiying Geng; Nan Huo; Xuanhe Zhou; Chenhao Ma; Guoliang Li; Kevin C. C. Chang; Fei Huang; Reynold Cheng; Yongbin Li (2024). BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) Dataset [Dataset]. https://paperswithcode.com/dataset/bird-sql
    Explore at:
    Dataset updated
    Jan 5, 2024
    Authors
    Jinyang Li; Binyuan Hui; Ge Qu; Jiaxi Yang; Binhua Li; Bowen Li; Bailin Wang; Bowen Qin; Rongyu Cao; Ruiying Geng; Nan Huo; Xuanhe Zhou; Chenhao Ma; Guoliang Li; Kevin C. C. Chang; Fei Huang; Reynold Cheng; Yongbin Li
    Description

    BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing. BIRD contains over 12,751 unique question-SQL pairs and 95 big databases with a total size of 33.4 GB. It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and education, etc.

  2. h

    BIRD-SQL-data

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wen-Ding Li, BIRD-SQL-data [Dataset]. https://huggingface.co/datasets/xu3kev/BIRD-SQL-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Wen-Ding Li
    Description

    Dataset Card for "BIRD-SQL-data"

    More Information needed

  3. h

    bird

    • huggingface.co
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mic (2024). bird [Dataset]. https://huggingface.co/datasets/micpst/bird
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 3, 2024
    Authors
    Mic
    Description

    BIRD-SQL

    Data from BIRD-SQL benchmark dev set (last release Jul 3, 2024). Ref: https://bird-bench.github.io

  4. h

    BIRD-SQL

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deema, BIRD-SQL [Dataset]. https://huggingface.co/datasets/Deema/BIRD-SQL
    Explore at:
    Authors
    Deema
    Description

    Deema/BIRD-SQL dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    BIRD-SQL-data-train-formatted

    • huggingface.co
    Updated Feb 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Li (2024). BIRD-SQL-data-train-formatted [Dataset]. https://huggingface.co/datasets/benjamintli/BIRD-SQL-data-train-formatted
    Explore at:
    Dataset updated
    Feb 6, 2024
    Authors
    Benjamin Li
    Description

    benjamintli/BIRD-SQL-data-train-formatted dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    bird-critic-1.0-flash-exp

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The BIRD Team, bird-critic-1.0-flash-exp [Dataset]. https://huggingface.co/datasets/birdsql/bird-critic-1.0-flash-exp
    Explore at:
    Dataset authored and provided by
    The BIRD Team
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    BIRD-CRITIC-1.0-Flash

    BIRD-Critic is the first SQL debugging benchmark designed to answer a critical question: Can large language models (LLMs) fix user issues in real-world database applications? Each task in BIRD-CRITIC has been verified by human experts on the following dimensions:

    Reproduction of errors on BIRD env to prevent data leakage. Carefully curate test case functions for each task specifically. Soft EX: This metric can evaluate SELECT-ONLY tasks. Soft EX + Parsing:… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-critic-1.0-flash-exp.

  7. h

    bird-interact-lite

    • huggingface.co
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The BIRD Team (2025). bird-interact-lite [Dataset]. https://huggingface.co/datasets/birdsql/bird-interact-lite
    Explore at:
    Dataset updated
    Jun 9, 2025
    Dataset authored and provided by
    The BIRD Team
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🧸 Overview

    BIRD-INTERACT, an interactive text-to-SQL benchmark, re-imagines Text-to-SQL evaluation via lens of dynamic interactions. The environment blends a hierarchical knowledge base, database documentation and a function-driven user simulator to recreate authentic enterprise environments across full CRUD operations. It offers two rigorous test modes: (1) passive Conversational Interaction and (2) active Agentic Interaction, spanning 600 annotated tasks including Business… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-interact-lite.

  8. h

    bird-critic-1.0-open

    • huggingface.co
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The BIRD Team (2025). bird-critic-1.0-open [Dataset]. https://huggingface.co/datasets/birdsql/bird-critic-1.0-open
    Explore at:
    Dataset updated
    Apr 25, 2025
    Dataset authored and provided by
    The BIRD Team
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Update 2025-05-22

    The previous issue regarding mismatched MySQL instances has been resolved. The updated version of BIRD-CRITIC-Open is now available. Thank you for your patience and understanding.

      Update 2025-04-25
    

    We’ve identified a mismatch issue in some uploaded MySQL instances. Our team is actively working to resolve this, and we’ll release the updated version promptly. Please refrain from using MySQL until the fix is deployed. Apologies for any inconvenience caused.… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-critic-1.0-open.

  9. h

    bird-critic-1.0-postgresql

    • huggingface.co
    Updated Jun 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The BIRD Team (2025). bird-critic-1.0-postgresql [Dataset]. https://huggingface.co/datasets/birdsql/bird-critic-1.0-postgresql
    Explore at:
    Dataset updated
    Jun 8, 2025
    Dataset authored and provided by
    The BIRD Team
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Update 2025-06-08

    We release the full version of BIRD-Critic-PG, a dataset containing 530 high-quality user issues focused on real-world PostgreSQL database applications. The schema file is include in the code repository https://github.com/bird-bench/BIRD-CRITIC-1/blob/main/baseline/data/post_schema.jsonl

      BIRD-CRITIC-1.0-PG
    

    BIRD-Critic is the first SQL debugging benchmark designed to answer a critical question: Can large language models (LLMs) fix user issues in… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-critic-1.0-postgresql.

  10. h

    bird_mini_dev

    • huggingface.co
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The BIRD Team (2025). bird_mini_dev [Dataset]. https://huggingface.co/datasets/birdsql/bird_mini_dev
    Explore at:
    Dataset updated
    Jul 4, 2025
    Dataset authored and provided by
    The BIRD Team
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    BIRD-SQL Mini-Dev

      Update 2025-07-04
    

    We are grateful for the valuable feedback from the community over the past year regarding BIRD Mini-Dev. Based on your suggestions, we have made significant updates to the BIRD Mini-Dev dataset.

      For New Users
    

    If you are new to BIRD Mini-Dev, you can download the complete databases and datasets using the following link: Download BIRD Mini-Dev Complete Package

      For Existing Users
    

    If you have already downloaded the… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird_mini_dev.

  11. Data from: Climate change does not equally affect temporal patterns of...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Sep 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcel E. Visser; Cherine Jantzen (2023). Climate change does not equally affect temporal patterns of natural selection on reproductive timing across populations in two songbird species [Dataset]. http://doi.org/10.5061/dryad.1zcrjdfz0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 25, 2023
    Dataset provided by
    Netherlands Institute of Ecology
    Authors
    Marcel E. Visser; Cherine Jantzen
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Climate change has led to changes in the strength of directional selection on seasonal timing. Understanding the causes and consequences of these changes is crucial to predicting the impact of climate change. But are observed patterns in one population generalisable to others, and can spatial variation in selection be explained by environmental variation among populations? We used long-term data (1955–2022) on blue and great tits co-occurring in four locations across the Netherlands to assess inter-population variation in temporal patterns of selection on laying date. To analyse selection, we combine reproduction and adult survival into a joined fitness measure. We found distinct spatial variation in temporal patterns of selection which overall acted towards earlier laying, and which was due to selection through reproduction rather than through survival. The underlying relationships between temperature, bird and caterpillar phenology were however the same across populations, and the spatial variation in selection patterns is thus caused by spatial variation in the temperatures and other habitat characteristics to which birds and caterpillars respond. This underlines that climate change is not necessarily equally affecting populations, but that we can understand this spatial variation, which enables us to predict climate change effects on selection for other populations. Methods Long-term data on breeding birds were collected by regular nest checks and by capturing and ringing birds. Data on caterpillar biomass was collected using frass nets. All data was stored in an relational SQL database and analysed using R.

  12. h

    bird-rl

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rihong, bird-rl [Dataset]. https://huggingface.co/datasets/Rihong/bird-rl
    Explore at:
    Authors
    Rihong
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    BIRD-RL

    This dataset is a processed dataset of BIRD-SQL for Post-Training.

  13. h

    livesqlbench-base-lite

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The BIRD Team, livesqlbench-base-lite [Dataset]. https://huggingface.co/datasets/birdsql/livesqlbench-base-lite
    Explore at:
    Dataset authored and provided by
    The BIRD Team
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🚀 LiveSQLBench-Base-Lite

    A dynamic, contamination‑free benchmark for evaluating LLMs on complex, real‑world text‑to‑SQL tasks. 🌐 Website • 📄 Paper (coming soon) • 💻 GitHub Maintained by the 🦜 BIRD Team @ HKU & ☁️ Google Cloud

      📊 LiveSQLBench Overview
    

    LiveSQLBench (BIRD-SQL Pro v0.5) is a contamination-free, continuously evolving benchmark designed to evaluate LLMs on complex, real-world text-to-SQL tasks, featuring diverse real-world user queries, including… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/livesqlbench-base-lite.

  14. h

    bird-sql-portuguese

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Breno, bird-sql-portuguese [Dataset]. https://huggingface.co/datasets/Boakpe/bird-sql-portuguese
    Explore at:
    Authors
    Breno
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    BIRD-SQL - Versão em Português

    Este repositório contém a tradução para português da partição de treino e desenvolvimento do benchmark BIRD-SQL, um benchmark para a tarefa de Text-to-SQL.

  15. h

    text2sql-dataset

    • huggingface.co
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    text2sql-dataset [Dataset]. https://huggingface.co/datasets/fahmiaziz/text2sql-dataset
    Explore at:
    Dataset updated
    Jun 20, 2025
    Authors
    Fahmi Aziz Fadhil
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    We built this dataset from several sources combining examples from:

    Wikisql Bird Spider Synthetic SQL samples

    This dataset has been cleaned and filtered by:

    Removing DDL/DML examples (INSERT, UPDATE, DELETE, etc.) De-duplicating examples based on hashing semantics of SQL and queries Filtering only SELECT-style analytical queries

  16. h

    Text2SQL_Workflow_Trace

    • huggingface.co
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    You Peng (2025). Text2SQL_Workflow_Trace [Dataset]. https://huggingface.co/datasets/fredpeng/Text2SQL_Workflow_Trace
    Explore at:
    Dataset updated
    May 12, 2025
    Authors
    You Peng
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Text2SQL Workflow Trace

      Dataset Description
    

    This dataset contains workflow traces for Text-to-SQL tasks, capturing the intermediate steps of translating natural language queries to executable SQL. It was used as input trace for the research presented in the paper:"HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow" (arXiv:2505.05286). The end-to-end Text-to-SQL queries collected in the dataset are from BIRD bench, and the trace… See the full description on the dataset page: https://huggingface.co/datasets/fredpeng/Text2SQL_Workflow_Trace.

  17. h

    spider

    • huggingface.co
    • opendatalab.com
    Updated Dec 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XLang NLP Lab (2021). spider [Dataset]. https://huggingface.co/datasets/xlangai/spider
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset authored and provided by
    XLang NLP Lab
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for Spider

      Dataset Summary
    

    Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.

      Supported Tasks and Leaderboards
    

    The leaderboard can be seen at https://yale-lily.github.io/spider

      Languages
    

    The text in the dataset is in English.

      Dataset Structure
    
    
    
    
    
      Data… See the full description on the dataset page: https://huggingface.co/datasets/xlangai/spider.
    
  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jinyang Li; Binyuan Hui; Ge Qu; Jiaxi Yang; Binhua Li; Bowen Li; Bailin Wang; Bowen Qin; Rongyu Cao; Ruiying Geng; Nan Huo; Xuanhe Zhou; Chenhao Ma; Guoliang Li; Kevin C. C. Chang; Fei Huang; Reynold Cheng; Yongbin Li (2024). BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) Dataset [Dataset]. https://paperswithcode.com/dataset/bird-sql

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) Dataset

Explore at:
Dataset updated
Jan 5, 2024
Authors
Jinyang Li; Binyuan Hui; Ge Qu; Jiaxi Yang; Binhua Li; Bowen Li; Bailin Wang; Bowen Qin; Rongyu Cao; Ruiying Geng; Nan Huo; Xuanhe Zhou; Chenhao Ma; Guoliang Li; Kevin C. C. Chang; Fei Huang; Reynold Cheng; Yongbin Li
Description

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing. BIRD contains over 12,751 unique question-SQL pairs and 95 big databases with a total size of 33.4 GB. It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and education, etc.

Search
Clear search
Close search
Google apps
Main menu