9 datasets found
  1. h

    spider

    • huggingface.co
    • opendatalab.com
    Updated Dec 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XLang NLP Lab (2021). spider [Dataset]. https://huggingface.co/datasets/xlangai/spider
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset authored and provided by
    XLang NLP Lab
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for Spider

      Dataset Summary
    

    Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.

      Supported Tasks and Leaderboards
    

    The leaderboard can be seen at https://yale-lily.github.io/spider

      Languages
    

    The text in the dataset is in English.

      Dataset Structure
    
    
    
    
    
      Data… See the full description on the dataset page: https://huggingface.co/datasets/xlangai/spider.
    
  2. Spider-Realistic

    • opendatalab.com
    • huggingface.co
    zip
    Updated Aug 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Ohio State University (2021). Spider-Realistic [Dataset]. https://opendatalab.com/OpenDataLab/Spider-Realistic
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 16, 2021
    Dataset provided by
    微軟http://microsoft.com/
    The Ohio State University
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Spider-Realistic dataset is used for evaluation in the paper "Structure-Grounded Pretraining for Text-to-SQL". The dataset is created based on the dev split of the Spider dataset (2020-06-07 version from https://yale-lily.github.io/spider). We manually modified the original questions to remove the explicit mention of column names while keeping the SQL queries unchanged to better evaluate the model's capability in aligning the NL utterance and the DB schema. For more details, please check our paper at https://arxiv.org/abs/2010.12773.

  3. h

    spider-schema

    • huggingface.co
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard R. (2023). spider-schema [Dataset]. https://huggingface.co/datasets/richardr1126/spider-schema
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 19, 2023
    Authors
    Richard R.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Spider Schema

      Dataset Summary
    

    Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. This dataset contains the 166 databases used in the Spider dataset.

      Yale Lily Spider Leaderboards
    

    The leaderboard can be seen at https://yale-lily.github.io/spider

      Languages
    

    The text in… See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-schema.

  4. h

    spider-context-validation

    • huggingface.co
    Updated Jul 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard R. (2023). spider-context-validation [Dataset]. https://huggingface.co/datasets/richardr1126/spider-context-validation
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 26, 2023
    Authors
    Richard R.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Spider Context Validation

      Dataset Summary
    

    Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. This dataset was created to validate spider-fine-tuned LLMs with database context.

      Yale Lily Spider Leaderboards
    

    The leaderboard can be seen at https://yale-lily.github.io/spider… See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-context-validation.

  5. h

    spider-skeleton-context-instruct

    • huggingface.co
    Updated Aug 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard R. (2023). spider-skeleton-context-instruct [Dataset]. https://huggingface.co/datasets/richardr1126/spider-skeleton-context-instruct
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2023
    Authors
    Richard R.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Spider Skeleton Context Instruct

      Dataset Summary
    

    Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. This dataset was created to finetune LLMs in a ### Instruction: and ### Response: format with database context.

      Yale Lily Spider Leaderboards
    

    The leaderboard can be seen at… See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-skeleton-context-instruct.

  6. h

    new-spider-HM

    • huggingface.co
    Updated Feb 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HUSNA M (2024). new-spider-HM [Dataset]. https://huggingface.co/datasets/HusnaManakkot/new-spider-HM
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 29, 2024
    Authors
    HUSNA M
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Spider

      Dataset Summary
    

    Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases

      Supported Tasks and Leaderboards
    

    The leaderboard can be seen at https://yale-lily.github.io/spider

      Languages
    

    The text in the dataset is in English.

      Dataset Structure
    
    
    
    
    
      Data… See the full description on the dataset page: https://huggingface.co/datasets/HusnaManakkot/new-spider-HM.
    
  7. h

    spider-corpus-train

    • huggingface.co
    Updated Sep 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TARGET Benchmark (2024). spider-corpus-train [Dataset]. https://huggingface.co/datasets/target-benchmark/spider-corpus-train
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2024
    Authors
    TARGET Benchmark
    Description

    Link to original dataset: https://yale-lily.github.io/spider Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., Roman, S. and Zhang, Z., 2018. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887.

  8. h

    spider-context-validation-ranked-schema

    • huggingface.co
    Updated Sep 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard R. (2023). spider-context-validation-ranked-schema [Dataset]. https://huggingface.co/datasets/richardr1126/spider-context-validation-ranked-schema
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 17, 2023
    Authors
    Richard R.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Spider Context Validation

      Ranked Schema by ChatGPT
    

    The database context used here is generated from ChatGPT after telling it to reorder the schema with the most relevant columns in the beginning of the db_info.

      Dataset Summary
    

    Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.… See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-context-validation-ranked-schema.

  9. spider-ko

    • huggingface.co
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face KREW (2025). spider-ko [Dataset]. https://huggingface.co/datasets/huggingface-KREW/spider-ko
    Explore at:
    Dataset updated
    Jun 3, 2025
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face KREW
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for spider-ko: 한국어 Text-to-SQL 데이터셋

      데이터셋 요약
    

    Spider-KO는 Yale University의 Spider 데이터셋을 한국어로 번역한 텍스트-SQL 변환 데이터셋입니다. 원본 Spider 데이터셋의 자연어 질문을 한국어로 번역하여 구성하였습니다. 이 데이터셋은 다양한 도메인의 데이터베이스에 대한 질의와 해당 SQL 쿼리를 포함하고 있으며, 한국어 Text-to-SQL 모델 개발 및 평가에 활용될 수 있습니다.

      지원 태스크 및 리더보드
    

    text-to-sql: 한국어 자연어 질문을 SQL 쿼리로 변환하는 태스크에 사용됩니다.

      언어
    

    데이터셋의 질문은 한국어(ko)로 번역되었으며, SQL 쿼리는 영어 기반으로 유지되었습니다. 원본 영어 질문도 함께 제공됩니다.

      데이터셋 구조
    
    
    
    
    
      데이터 필드
    

    db_id… See the full description on the dataset page: https://huggingface.co/datasets/huggingface-KREW/spider-ko.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
XLang NLP Lab (2021). spider [Dataset]. https://huggingface.co/datasets/xlangai/spider

spider

Spider

xlangai/spider

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset authored and provided by
XLang NLP Lab
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset Card for Spider

  Dataset Summary

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.

  Supported Tasks and Leaderboards

The leaderboard can be seen at https://yale-lily.github.io/spider

  Languages

The text in the dataset is in English.

  Dataset Structure





  Data… See the full description on the dataset page: https://huggingface.co/datasets/xlangai/spider.
Search
Clear search
Close search
Google apps
Main menu