56 datasets found

h
spider
huggingface.co
opendatalab.com
Updated Dec 9, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
XLang NLP Lab (2021). spider [Dataset]. https://huggingface.co/datasets/xlangai/spider
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset authored and provided by
XLang NLP Lab
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for Spider

Dataset Summary

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.

Supported Tasks and Leaderboards

The leaderboard can be seen at https://yale-lily.github.io/spider

Languages

The text in the dataset is in English.

Dataset Structure Data… See the full description on the dataset page: https://huggingface.co/datasets/xlangai/spider.
h
spider-realistic
huggingface.co
opendatalab.com
+1more
Updated Feb 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AhernTech s.r.o. (2024). spider-realistic [Dataset]. https://huggingface.co/datasets/aherntech/spider-realistic
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 17, 2024
Dataset authored and provided by
AhernTech s.r.o.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Spider-Releastic

This dataset variant contains only the Spider Realistic dataset used in "Structure-Grounded Pretraining for Text-to-SQL". The dataset is created based on the dev split of the Spider dataset (2020-06-07 version from https://yale-lily.github.io/spider). The authors of the dataset modified the original questions to remove the explicit mention of column names while keeping the SQL queries unchanged to better evaluate the model's capability in aligning… See the full description on the dataset page: https://huggingface.co/datasets/aherntech/spider-realistic.
h
spider-schema
huggingface.co
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard R. (2023). spider-schema [Dataset]. https://huggingface.co/datasets/richardr1126/spider-schema
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 19, 2023
Authors
Richard R.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Spider Schema

Dataset Summary

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. This dataset contains the 166 databases used in the Spider dataset.

Yale Lily Spider Leaderboards

The leaderboard can be seen at https://yale-lily.github.io/spider

Languages

The text in… See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-schema.
h
spider-FIT-en-extra-3enr-1enb
huggingface.co
Updated Jan 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcelo Archanjo Jose (2024). spider-FIT-en-extra-3enr-1enb [Dataset]. https://huggingface.co/datasets/Marchanjo/spider-FIT-en-extra-3enr-1enb
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 16, 2024
Authors
Marcelo Archanjo Jose
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Distributed under the Creative Commons-by-sa-4.0 respecting the ShareAlike of the Spider Dataset. Code explanations and links for the model's checkpoints and datasets are on Github mRAT-SQL Here is the Hugging Face collection, you can download the model's checkpoints and datasets, but to understand is better to go to Github mRAT-SQL.

mRAT-SQL-FIT A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention

Marcelo Archanjo Jose, Fabio… See the full description on the dataset page: https://huggingface.co/datasets/Marchanjo/spider-FIT-en-extra-3enr-1enb.
h
fixed_spider
huggingface.co
Updated Jun 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Turbular (2024). fixed_spider [Dataset]. https://huggingface.co/datasets/Turbular/fixed_spider
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 14, 2024
Dataset authored and provided by
Turbular
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Cleaned Spider Dataset for Text2SQL

Dataset Summary

The Cleaned Spider Dataset for Text2SQL is an improved version of the original Spider dataset, which is a large-scale, complex, and cross-domain semantic parsing and text-to-SQL dataset. This enhanced version addresses several critical issues found in the original dataset, ensuring higher quality and reliability for training text-to-SQL models. The enhancements were made possible through Turbular's advanced data… See the full description on the dataset page: https://huggingface.co/datasets/Turbular/fixed_spider.
h
spider-sql
huggingface.co
Updated Jun 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chapaneri (2024). spider-sql [Dataset]. https://huggingface.co/datasets/radhikachapaneri/spider-sql
Explore at:
Dataset updated
Jun 30, 2024
Authors
Chapaneri
Description
radhikachapaneri/spider-sql dataset hosted on Hugging Face and contributed by the HF Datasets community
h
spider-skeleton-context-instruct
huggingface.co
Updated Aug 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard R. (2023). spider-skeleton-context-instruct [Dataset]. https://huggingface.co/datasets/richardr1126/spider-skeleton-context-instruct
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2023
Authors
Richard R.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Spider Skeleton Context Instruct

Dataset Summary

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. This dataset was created to finetune LLMs in a ### Instruction: and ### Response: format with database context.

Yale Lily Spider Leaderboards

The leaderboard can be seen at… See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-skeleton-context-instruct.
h
SPIDER
huggingface.co
Updated Feb 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chris Oswald (2024). SPIDER [Dataset]. https://huggingface.co/datasets/cdoswald/SPIDER
Explore at:
Dataset updated
Feb 24, 2024
Authors
Chris Oswald
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a large publicly available multi-center lumbar spine magnetic resonance imaging (MRI) dataset with reference segmentations of vertebrae, intervertebral discs (IVDs), and spinal canal. The dataset includes 447 sagittal T1 and T2 MRI series from 218 studies of 218 patients with a history of low back pain. The data was collected from four different hospitals. There is an additional hidden test set, not available here, used in the accompanying SPIDER challenge on spider.grand-challenge.org. We share this data to encourage wider participation and collaboration in the field of spine segmentation, and ultimately improve the diagnostic value of lumbar spine MRI.

This file also provides the biological sex for all patients and the age for the patients for which this was available. It also includes a number of scanner and acquisition parameters for each individual MRI study. The dataset also comes with radiological gradings found in a separate file for the following degenerative changes:

1. Modic changes (type I, II or III)

2. Upper and lower endplate changes / Schmorl nodes (binary)

3. Spondylolisthesis (binary)

4. Disc herniation (binary)

5. Disc narrowing (binary)

6. Disc bulging (binary)

7. Pfirrman grade (grade 1 to 5).

All radiological gradings are provided per IVD level.

Repository: https://zenodo.org/records/10159290 Paper: https://www.nature.com/articles/s41597-024-03090-w
h
spider-corpus-test
huggingface.co
Updated Sep 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TARGET Benchmark (2024). spider-corpus-test [Dataset]. https://huggingface.co/datasets/target-benchmark/spider-corpus-test
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 21, 2024
Authors
TARGET Benchmark
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Link to original dataset: https://yale-lily.github.io/spider Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., Roman, S. and Zhang, Z., 2018. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887.
h
spider
huggingface.co
Updated Apr 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
free (2024). spider [Dataset]. https://huggingface.co/datasets/nanina1/spider
Explore at:
Dataset updated
Apr 2, 2024
Authors
free
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
nanina1/spider dataset hosted on Hugging Face and contributed by the HF Datasets community
O
sql-create-context
opendatalab.com
huggingface.co
zip
Updated Apr 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). sql-create-context [Dataset]. https://opendatalab.com/OpenDataLab/sql-create-context
Explore at:
zipAvailable download formats
Dataset updated
Apr 21, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset builds from WikiSQL and Spider. There are 78,577 examples of natural language queries, SQL CREATE TABLE statements, and SQL Query answering the question using the CREATE statement as context. This dataset was built with text-to-sql LLMs in mind, intending to prevent hallucination of column and table names often seen when trained on text-to-sql datasets. The CREATE TABLE statement can often be copy and pasted from different DBMS and provides table names, column names and their data types. By providing just the CREATE TABLE statement as context, we can hopefully provide better grounding for models without having to provide actual rows of data, limiting token usage and exposure to private, sensitive, or proprietary data.
h
spider
huggingface.co
Updated Jun 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hy (2024). spider [Dataset]. https://huggingface.co/datasets/hyess/spider
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2024
Authors
hy
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
hyess/spider dataset hosted on Hugging Face and contributed by the HF Datasets community
h
spider
huggingface.co
Updated Apr 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simone PAPICCHIO (2025). spider [Dataset]. https://huggingface.co/datasets/simone-papicchio/spider
Explore at:
Dataset updated
Apr 18, 2025
Authors
Simone PAPICCHIO
Description
simone-papicchio/spider dataset hosted on Hugging Face and contributed by the HF Datasets community
h
spider-context-validation
huggingface.co
Updated Jul 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard R. (2023). spider-context-validation [Dataset]. https://huggingface.co/datasets/richardr1126/spider-context-validation
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 26, 2023
Authors
Richard R.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Spider Context Validation

Dataset Summary

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. This dataset was created to validate spider-fine-tuned LLMs with database context.

Yale Lily Spider Leaderboards

The leaderboard can be seen at https://yale-lily.github.io/spider… See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-context-validation.
spider-ko
huggingface.co
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face KREW (2025). spider-ko [Dataset]. https://huggingface.co/datasets/huggingface-KREW/spider-ko
Explore at:
Dataset updated
Jun 3, 2025
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face KREW
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for spider-ko: 한국어 Text-to-SQL 데이터셋

데이터셋 요약

Spider-KO는 Yale University의 Spider 데이터셋을 한국어로 번역한 텍스트-SQL 변환 데이터셋입니다. 원본 Spider 데이터셋의 자연어 질문을 한국어로 번역하여 구성하였습니다. 이 데이터셋은 다양한 도메인의 데이터베이스에 대한 질의와 해당 SQL 쿼리를 포함하고 있으며, 한국어 Text-to-SQL 모델 개발 및 평가에 활용될 수 있습니다.

지원 태스크 및 리더보드

text-to-sql: 한국어 자연어 질문을 SQL 쿼리로 변환하는 태스크에 사용됩니다.

언어

데이터셋의 질문은 한국어(ko)로 번역되었으며, SQL 쿼리는 영어 기반으로 유지되었습니다. 원본 영어 질문도 함께 제공됩니다.

데이터셋 구조 데이터 필드

db_id… See the full description on the dataset page: https://huggingface.co/datasets/huggingface-KREW/spider-ko.
h
spider-en-enr-enb
huggingface.co
Updated Jan 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcelo Archanjo Jose (2024). spider-en-enr-enb [Dataset]. https://huggingface.co/datasets/Marchanjo/spider-en-enr-enb
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 16, 2024
Authors
Marcelo Archanjo Jose
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Distributed under the Creative Commons-by-sa-4.0 respecting the ShareAlike of the Spider Dataset. Code explanations and links for the model's checkpoints and datasets are on Github mRAT-SQL Here is the Hugging Face collection, you can download the model's checkpoints and datasets, but to understand is better to go to Github mRAT-SQL.

mRAT-SQL-FIT A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention

Marcelo Archanjo Jose, Fabio… See the full description on the dataset page: https://huggingface.co/datasets/Marchanjo/spider-en-enr-enb.
h
spider-tableQA
huggingface.co
Updated Feb 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vaishali Pal (2024). spider-tableQA [Dataset]. https://huggingface.co/datasets/vaishali/spider-tableQA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 20, 2024
Authors
Vaishali Pal
Description
Dataset Card for "spider-tableQA"

Usage

import pandas as pd from datasets import load_dataset

spider_tableQA = load_dataset("vaishali/spider-tableQA")

for sample in spider_tableQA['train']: question = sample['question'] sql_query = sample['query'] input_table_names = sample["table_names"] input_tables = [pd.read_json(table, orient='split') for table in sample['tables']] answer = pd.read_json(sample['answer'], orient='split')

# flattened input/output… See the full description on the dataset page: https://huggingface.co/datasets/vaishali/spider-tableQA.
h
ORPRO-Spider-SQL-Filtered
huggingface.co
Updated Jan 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maurice (2025). ORPRO-Spider-SQL-Filtered [Dataset]. https://huggingface.co/datasets/mjerome89/ORPRO-Spider-SQL-Filtered
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 9, 2025
Authors
Maurice
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
mjerome89/ORPRO-Spider-SQL-Filtered dataset hosted on Hugging Face and contributed by the HF Datasets community
h
new-spider-HM
huggingface.co
Updated Feb 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HUSNA M (2024). new-spider-HM [Dataset]. https://huggingface.co/datasets/HusnaManakkot/new-spider-HM
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 29, 2024
Authors
HUSNA M
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Spider

Dataset Summary

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases

Supported Tasks and Leaderboards

The leaderboard can be seen at https://yale-lily.github.io/spider

Languages

The text in the dataset is in English.

Dataset Structure Data… See the full description on the dataset page: https://huggingface.co/datasets/HusnaManakkot/new-spider-HM.
h
spider-clean-text-to-sql-4
huggingface.co
Updated Jun 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Victor Dubus-Chanson (2024). spider-clean-text-to-sql-4 [Dataset]. https://huggingface.co/datasets/VictorDCh/spider-clean-text-to-sql-4
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2024
Authors
Victor Dubus-Chanson
Description
VictorDCh/spider-clean-text-to-sql-4 dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

XLang NLP Lab (2021). spider [Dataset]. https://huggingface.co/datasets/xlangai/spider

spider

Spider

xlangai/spider

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 9, 2021

Dataset authored and provided by

XLang NLP Lab

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset Card for Spider

  Dataset Summary

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.

  Supported Tasks and Leaderboards

The leaderboard can be seen at https://yale-lily.github.io/spider

  Languages

The text in the dataset is in English.

  Dataset Structure





  Data… See the full description on the dataset page: https://huggingface.co/datasets/xlangai/spider.

Clear search

Close search

Google apps

Main menu

spider

spider-realistic

spider-schema

spider-FIT-en-extra-3enr-1enb

fixed_spider

spider-sql

spider-skeleton-context-instruct

SPIDER

spider-corpus-test

spider

sql-create-context

spider

spider

spider-context-validation

spider-ko

spider-en-enr-enb

spider-tableQA

ORPRO-Spider-SQL-Filtered

new-spider-HM

spider-clean-text-to-sql-4

spiderSee More Versions

Spider

xlangai/spider

spider