9 datasets found

h
spider
huggingface.co
opendatalab.com
Updated Dec 9, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
XLang NLP Lab (2021). spider [Dataset]. https://huggingface.co/datasets/xlangai/spider
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset authored and provided by
XLang NLP Lab
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for Spider

Dataset Summary

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.

Supported Tasks and Leaderboards

The leaderboard can be seen at https://yale-lily.github.io/spider

Languages

The text in the dataset is in English.

Dataset Structure Data… See the full description on the dataset page: https://huggingface.co/datasets/xlangai/spider.
Spider-Realistic
opendatalab.com
huggingface.co
zip
Updated Aug 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Ohio State University (2021). Spider-Realistic [Dataset]. https://opendatalab.com/OpenDataLab/Spider-Realistic
Explore at:
zipAvailable download formats
Dataset updated
Aug 16, 2021
Dataset provided by
微軟http://microsoft.com/
The Ohio State University
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Spider-Realistic dataset is used for evaluation in the paper "Structure-Grounded Pretraining for Text-to-SQL". The dataset is created based on the dev split of the Spider dataset (2020-06-07 version from https://yale-lily.github.io/spider). We manually modified the original questions to remove the explicit mention of column names while keeping the SQL queries unchanged to better evaluate the model's capability in aligning the NL utterance and the DB schema. For more details, please check our paper at https://arxiv.org/abs/2010.12773.
h
spider-schema
huggingface.co
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard R. (2023). spider-schema [Dataset]. https://huggingface.co/datasets/richardr1126/spider-schema
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 19, 2023
Authors
Richard R.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Spider Schema

Dataset Summary

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. This dataset contains the 166 databases used in the Spider dataset.

Yale Lily Spider Leaderboards

The leaderboard can be seen at https://yale-lily.github.io/spider

Languages

The text in… See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-schema.
h
spider-context-validation
huggingface.co
Updated Jul 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard R. (2023). spider-context-validation [Dataset]. https://huggingface.co/datasets/richardr1126/spider-context-validation
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 26, 2023
Authors
Richard R.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Spider Context Validation

Dataset Summary

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. This dataset was created to validate spider-fine-tuned LLMs with database context.

Yale Lily Spider Leaderboards

The leaderboard can be seen at https://yale-lily.github.io/spider… See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-context-validation.
h
spider-skeleton-context-instruct
huggingface.co
Updated Aug 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard R. (2023). spider-skeleton-context-instruct [Dataset]. https://huggingface.co/datasets/richardr1126/spider-skeleton-context-instruct
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2023
Authors
Richard R.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Spider Skeleton Context Instruct

Dataset Summary

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. This dataset was created to finetune LLMs in a ### Instruction: and ### Response: format with database context.

Yale Lily Spider Leaderboards

The leaderboard can be seen at… See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-skeleton-context-instruct.
h
new-spider-HM
huggingface.co
Updated Feb 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HUSNA M (2024). new-spider-HM [Dataset]. https://huggingface.co/datasets/HusnaManakkot/new-spider-HM
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 29, 2024
Authors
HUSNA M
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Spider

Dataset Summary

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases

Supported Tasks and Leaderboards

The leaderboard can be seen at https://yale-lily.github.io/spider

Languages

The text in the dataset is in English.

Dataset Structure Data… See the full description on the dataset page: https://huggingface.co/datasets/HusnaManakkot/new-spider-HM.
h
spider-corpus-train
huggingface.co
Updated Sep 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TARGET Benchmark (2024). spider-corpus-train [Dataset]. https://huggingface.co/datasets/target-benchmark/spider-corpus-train
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 21, 2024
Authors
TARGET Benchmark
Description
Link to original dataset: https://yale-lily.github.io/spider Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., Roman, S. and Zhang, Z., 2018. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887.
h
spider-context-validation-ranked-schema
huggingface.co
Updated Sep 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard R. (2023). spider-context-validation-ranked-schema [Dataset]. https://huggingface.co/datasets/richardr1126/spider-context-validation-ranked-schema
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 17, 2023
Authors
Richard R.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Spider Context Validation

Ranked Schema by ChatGPT

The database context used here is generated from ChatGPT after telling it to reorder the schema with the most relevant columns in the beginning of the db_info.

Dataset Summary

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.… See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-context-validation-ranked-schema.
spider-ko
huggingface.co
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face KREW (2025). spider-ko [Dataset]. https://huggingface.co/datasets/huggingface-KREW/spider-ko
Explore at:
Dataset updated
Jun 3, 2025
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face KREW
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for spider-ko: 한국어 Text-to-SQL 데이터셋

데이터셋 요약

Spider-KO는 Yale University의 Spider 데이터셋을 한국어로 번역한 텍스트-SQL 변환 데이터셋입니다. 원본 Spider 데이터셋의 자연어 질문을 한국어로 번역하여 구성하였습니다. 이 데이터셋은 다양한 도메인의 데이터베이스에 대한 질의와 해당 SQL 쿼리를 포함하고 있으며, 한국어 Text-to-SQL 모델 개발 및 평가에 활용될 수 있습니다.

지원 태스크 및 리더보드

text-to-sql: 한국어 자연어 질문을 SQL 쿼리로 변환하는 태스크에 사용됩니다.

언어

데이터셋의 질문은 한국어(ko)로 번역되었으며, SQL 쿼리는 영어 기반으로 유지되었습니다. 원본 영어 질문도 함께 제공됩니다.

데이터셋 구조 데이터 필드

db_id… See the full description on the dataset page: https://huggingface.co/datasets/huggingface-KREW/spider-ko.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

XLang NLP Lab (2021). spider [Dataset]. https://huggingface.co/datasets/xlangai/spider

spider

Spider

xlangai/spider

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 9, 2021

Dataset authored and provided by

XLang NLP Lab

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset Card for Spider

  Dataset Summary

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.

  Supported Tasks and Leaderboards

The leaderboard can be seen at https://yale-lily.github.io/spider

  Languages

The text in the dataset is in English.

  Dataset Structure





  Data… See the full description on the dataset page: https://huggingface.co/datasets/xlangai/spider.

Clear search

Close search

Google apps

Main menu

spider

Spider-Realistic

spider-schema

spider-context-validation

spider-skeleton-context-instruct

new-spider-HM

spider-corpus-train

spider-context-validation-ranked-schema

spider-ko

spiderSee More Versions

Spider

xlangai/spider

spider