Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
BIRD-CRITIC-1.0-Flash
BIRD-Critic is the first SQL debugging benchmark designed to answer a critical question: Can large language models (LLMs) fix user issues in real-world database applications? Each task in BIRD-CRITIC has been verified by human experts on the following dimensions:
Reproduction of errors on BIRD env to prevent data leakage. Carefully curate test case functions for each task specifically. Soft EX: This metric can evaluate SELECT-ONLY tasks. Soft EX + Parsing:… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-critic-1.0-flash-exp.
BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing. BIRD contains over 12,751 unique question-SQL pairs and 95 big databases with a total size of 33.4 GB. It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and education, etc.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
BIRD-SQL Mini-Dev
Update 2025-07-04
We are grateful for the valuable feedback from the community over the past year regarding BIRD Mini-Dev. Based on your suggestions, we have made significant updates to the BIRD Mini-Dev dataset.
For New Users
If you are new to BIRD Mini-Dev, you can download the complete databases and datasets using the following link: Download BIRD Mini-Dev Complete Package
For Existing Users
If you have already downloaded the… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird_mini_dev.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Update 2025-05-22
The previous issue regarding mismatched MySQL instances has been resolved. The updated version of BIRD-CRITIC-Open is now available. Thank you for your patience and understanding.
Update 2025-04-25
We’ve identified a mismatch issue in some uploaded MySQL instances. Our team is actively working to resolve this, and we’ll release the updated version promptly. Please refrain from using MySQL until the fix is deployed. Apologies for any inconvenience caused.… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-critic-1.0-open.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
🧸 Overview
BIRD-INTERACT, an interactive text-to-SQL benchmark, re-imagines Text-to-SQL evaluation via lens of dynamic interactions. The environment blends a hierarchical knowledge base, database documentation and a function-driven user simulator to recreate authentic enterprise environments across full CRUD operations. It offers two rigorous test modes: (1) passive Conversational Interaction and (2) active Agentic Interaction, spanning 600 annotated tasks including Business… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-interact-lite.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Update 2025-06-08
We release the full version of BIRD-Critic-PG, a dataset containing 530 high-quality user issues focused on real-world PostgreSQL database applications. The schema file is include in the code repository https://github.com/bird-bench/BIRD-CRITIC-1/blob/main/baseline/data/post_schema.jsonl
BIRD-CRITIC-1.0-PG
BIRD-Critic is the first SQL debugging benchmark designed to answer a critical question: Can large language models (LLMs) fix user issues in… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-critic-1.0-postgresql.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
🚀 LiveSQLBench-Base-Lite
A dynamic, contamination‑free benchmark for evaluating LLMs on complex, real‑world text‑to‑SQL tasks. 🌐 Website • 📄 Paper (coming soon) • 💻 GitHub Maintained by the 🦜 BIRD Team @ HKU & ☁️ Google Cloud
📊 LiveSQLBench Overview
LiveSQLBench (BIRD-SQL Pro v0.5) is a contamination-free, continuously evolving benchmark designed to evaluate LLMs on complex, real-world text-to-SQL tasks, featuring diverse real-world user queries, including… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/livesqlbench-base-lite.
Dataset Card for "BIRD-SQL-data"
More Information needed
BIRD-SQL
Data from BIRD-SQL benchmark dev set (last release Jul 3, 2024). Ref: https://bird-bench.github.io
benjamintli/BIRD-SQL-data-train-formatted dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
BIRD-RL
This dataset is a processed dataset of BIRD-SQL for Post-Training.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
BIRD-CRITIC-1.0-Flash
BIRD-Critic is the first SQL debugging benchmark designed to answer a critical question: Can large language models (LLMs) fix user issues in real-world database applications? Each task in BIRD-CRITIC has been verified by human experts on the following dimensions:
Reproduction of errors on BIRD env to prevent data leakage. Carefully curate test case functions for each task specifically. Soft EX: This metric can evaluate SELECT-ONLY tasks. Soft EX + Parsing:… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-critic-1.0-flash-exp.