12 datasets found

h
bird-critic-1.0-flash-exp
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The BIRD Team, bird-critic-1.0-flash-exp [Dataset]. https://huggingface.co/datasets/birdsql/bird-critic-1.0-flash-exp
Explore at:
Dataset authored and provided by
The BIRD Team
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
BIRD-CRITIC-1.0-Flash

BIRD-Critic is the first SQL debugging benchmark designed to answer a critical question: Can large language models (LLMs) fix user issues in real-world database applications? Each task in BIRD-CRITIC has been verified by human experts on the following dimensions:

Reproduction of errors on BIRD env to prevent data leakage. Carefully curate test case functions for each task specifically. Soft EX: This metric can evaluate SELECT-ONLY tasks. Soft EX + Parsing:… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-critic-1.0-flash-exp.
P
BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)...
paperswithcode.com
Updated Sep 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jinyang Li; Binyuan Hui; Ge Qu; Jiaxi Yang; Binhua Li; Bowen Li; Bailin Wang; Bowen Qin; Rongyu Cao; Ruiying Geng; Nan Huo; Xuanhe Zhou; Chenhao Ma; Guoliang Li; Kevin C. C. Chang; Fei Huang; Reynold Cheng; Yongbin Li (2024). BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) Dataset [Dataset]. https://paperswithcode.com/dataset/bird-sql
Explore at:
Dataset updated
Sep 24, 2024
Authors
Jinyang Li; Binyuan Hui; Ge Qu; Jiaxi Yang; Binhua Li; Bowen Li; Bailin Wang; Bowen Qin; Rongyu Cao; Ruiying Geng; Nan Huo; Xuanhe Zhou; Chenhao Ma; Guoliang Li; Kevin C. C. Chang; Fei Huang; Reynold Cheng; Yongbin Li
Description
BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing. BIRD contains over 12,751 unique question-SQL pairs and 95 big databases with a total size of 33.4 GB. It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and education, etc.
h
bird_mini_dev
huggingface.co
Updated Jul 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The BIRD Team (2024). bird_mini_dev [Dataset]. https://huggingface.co/datasets/birdsql/bird_mini_dev
Explore at:
Dataset updated
Jul 26, 2024
Dataset authored and provided by
The BIRD Team
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
BIRD-SQL Mini-Dev

Update 2025-07-04

We are grateful for the valuable feedback from the community over the past year regarding BIRD Mini-Dev. Based on your suggestions, we have made significant updates to the BIRD Mini-Dev dataset.

For New Users

If you are new to BIRD Mini-Dev, you can download the complete databases and datasets using the following link: Download BIRD Mini-Dev Complete Package

For Existing Users

If you have already downloaded the… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird_mini_dev.
h
bird-critic-1.0-open
huggingface.co
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The BIRD Team (2025). bird-critic-1.0-open [Dataset]. https://huggingface.co/datasets/birdsql/bird-critic-1.0-open
Explore at:
Dataset updated
Apr 25, 2025
Dataset authored and provided by
The BIRD Team
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Update 2025-05-22

The previous issue regarding mismatched MySQL instances has been resolved. The updated version of BIRD-CRITIC-Open is now available. Thank you for your patience and understanding.

Update 2025-04-25

We’ve identified a mismatch issue in some uploaded MySQL instances. Our team is actively working to resolve this, and we’ll release the updated version promptly. Please refrain from using MySQL until the fix is deployed. Apologies for any inconvenience caused.… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-critic-1.0-open.
h
bird-interact-lite
huggingface.co
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The BIRD Team (2025). bird-interact-lite [Dataset]. https://huggingface.co/datasets/birdsql/bird-interact-lite
Explore at:
Dataset updated
Jun 9, 2025
Dataset authored and provided by
The BIRD Team
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
🧸 Overview

BIRD-INTERACT, an interactive text-to-SQL benchmark, re-imagines Text-to-SQL evaluation via lens of dynamic interactions. The environment blends a hierarchical knowledge base, database documentation and a function-driven user simulator to recreate authentic enterprise environments across full CRUD operations. It offers two rigorous test modes: (1) passive Conversational Interaction and (2) active Agentic Interaction, spanning 600 annotated tasks including Business… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-interact-lite.
h
bird-critic-1.0-postgresql
huggingface.co
Updated Jun 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The BIRD Team (2025). bird-critic-1.0-postgresql [Dataset]. https://huggingface.co/datasets/birdsql/bird-critic-1.0-postgresql
Explore at:
Dataset updated
Jun 8, 2025
Dataset authored and provided by
The BIRD Team
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Update 2025-06-08

We release the full version of BIRD-Critic-PG, a dataset containing 530 high-quality user issues focused on real-world PostgreSQL database applications. The schema file is include in the code repository https://github.com/bird-bench/BIRD-CRITIC-1/blob/main/baseline/data/post_schema.jsonl

BIRD-CRITIC-1.0-PG

BIRD-Critic is the first SQL debugging benchmark designed to answer a critical question: Can large language models (LLMs) fix user issues in… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-critic-1.0-postgresql.
h
livesqlbench-base-lite
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The BIRD Team, livesqlbench-base-lite [Dataset]. https://huggingface.co/datasets/birdsql/livesqlbench-base-lite
Explore at:
Dataset authored and provided by
The BIRD Team
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
🚀 LiveSQLBench-Base-Lite

A dynamic, contamination‑free benchmark for evaluating LLMs on complex, real‑world text‑to‑SQL tasks. 🌐 Website • 📄 Paper (coming soon) • 💻 GitHub Maintained by the 🦜 BIRD Team @ HKU & ☁️ Google Cloud

📊 LiveSQLBench Overview

LiveSQLBench (BIRD-SQL Pro v0.5) is a contamination-free, continuously evolving benchmark designed to evaluate LLMs on complex, real-world text-to-SQL tasks, featuring diverse real-world user queries, including… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/livesqlbench-base-lite.
h
BIRD-SQL-data
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wen-Ding Li, BIRD-SQL-data [Dataset]. https://huggingface.co/datasets/xu3kev/BIRD-SQL-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Wen-Ding Li
Description
Dataset Card for "BIRD-SQL-data"

More Information needed
h
BIRD-SQL
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deema, BIRD-SQL [Dataset]. https://huggingface.co/datasets/Deema/BIRD-SQL
Explore at:
Authors
Deema
Description
Deema/BIRD-SQL dataset hosted on Hugging Face and contributed by the HF Datasets community
h
bird
huggingface.co
Updated Jul 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mic (2024). bird [Dataset]. https://huggingface.co/datasets/micpst/bird
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 3, 2024
Authors
Mic
Description
BIRD-SQL

Data from BIRD-SQL benchmark dev set (last release Jul 3, 2024). Ref: https://bird-bench.github.io
h
BIRD-SQL-data-train-formatted
huggingface.co
Updated Feb 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Li (2024). BIRD-SQL-data-train-formatted [Dataset]. https://huggingface.co/datasets/benjamintli/BIRD-SQL-data-train-formatted
Explore at:
Dataset updated
Feb 6, 2024
Authors
Benjamin Li
Description
benjamintli/BIRD-SQL-data-train-formatted dataset hosted on Hugging Face and contributed by the HF Datasets community
h
bird-rl
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rihong, bird-rl [Dataset]. https://huggingface.co/datasets/Rihong/bird-rl
Explore at:
Authors
Rihong
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
BIRD-RL

This dataset is a processed dataset of BIRD-SQL for Post-Training.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

The BIRD Team, bird-critic-1.0-flash-exp [Dataset]. https://huggingface.co/datasets/birdsql/bird-critic-1.0-flash-exp

bird-critic-1.0-flash-exp

birdsql/bird-critic-1.0-flash-exp

Explore at:

Dataset authored and provided by

The BIRD Team

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

BIRD-CRITIC-1.0-Flash

BIRD-Critic is the first SQL debugging benchmark designed to answer a critical question: Can large language models (LLMs) fix user issues in real-world database applications? Each task in BIRD-CRITIC has been verified by human experts on the following dimensions:

Reproduction of errors on BIRD env to prevent data leakage. Carefully curate test case functions for each task specifically. Soft EX: This metric can evaluate SELECT-ONLY tasks. Soft EX + Parsing:… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-critic-1.0-flash-exp.

Clear search

Close search

Google apps

Main menu

bird-critic-1.0-flash-exp

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)...

bird_mini_dev

bird-critic-1.0-open

bird-interact-lite

bird-critic-1.0-postgresql

livesqlbench-base-lite

BIRD-SQL-data

BIRD-SQL

bird

BIRD-SQL-data-train-formatted

bird-rl

bird-critic-1.0-flash-exp

birdsql/bird-critic-1.0-flash-exp