34 datasets found
  1. h

    spider

    • huggingface.co
    • opendatalab.com
    Updated Dec 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XLang NLP Lab (2021). spider [Dataset]. https://huggingface.co/datasets/xlangai/spider
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset authored and provided by
    XLang NLP Lab
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for Spider

      Dataset Summary
    

    Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.

      Supported Tasks and Leaderboards
    

    The leaderboard can be seen at https://yale-lily.github.io/spider

      Languages
    

    The text in the dataset is in English.

      Dataset Structure
    
    
    
    
    
      Data… See the full description on the dataset page: https://huggingface.co/datasets/xlangai/spider.
    
  2. P

    Spider-Realistic Dataset

    • paperswithcode.com
    • opendatalab.com
    • +1more
    Updated Sep 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiang Deng; Ahmed Hassan Awadallah; Christopher Meek; Oleksandr Polozov; Huan Sun; Matthew Richardson (2021). Spider-Realistic Dataset [Dataset]. https://paperswithcode.com/dataset/spider-realistic
    Explore at:
    Dataset updated
    Sep 11, 2021
    Authors
    Xiang Deng; Ahmed Hassan Awadallah; Christopher Meek; Oleksandr Polozov; Huan Sun; Matthew Richardson
    Description

    Spider dataset is used for evaluation in the paper "Structure-Grounded Pretraining for Text-to-SQL". The dataset is created based on the dev split of the Spider dataset (2020-06-07 version from https://yale-lily.github.io/spider). We manually modified the original questions to remove the explicit mention of column names while keeping the SQL queries unchanged to better evaluate the model's capability in aligning the NL utterance and the DB schema. For more details, please check our paper at https://arxiv.org/abs/2010.12773.

  3. h

    spider-schema

    • huggingface.co
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard R. (2023). spider-schema [Dataset]. https://huggingface.co/datasets/richardr1126/spider-schema
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 19, 2023
    Authors
    Richard R.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Spider Schema

      Dataset Summary
    

    Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. This dataset contains the 166 databases used in the Spider dataset.

      Yale Lily Spider Leaderboards
    

    The leaderboard can be seen at https://yale-lily.github.io/spider

      Languages
    

    The text in… See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-schema.

  4. h

    spider-skeleton-context-instruct

    • huggingface.co
    Updated Aug 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard R. (2023). spider-skeleton-context-instruct [Dataset]. https://huggingface.co/datasets/richardr1126/spider-skeleton-context-instruct
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2023
    Authors
    Richard R.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Spider Skeleton Context Instruct

      Dataset Summary
    

    Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. This dataset was created to finetune LLMs in a ### Instruction: and ### Response: format with database context.

      Yale Lily Spider Leaderboards
    

    The leaderboard can be seen at… See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-skeleton-context-instruct.

  5. P

    Spider 2.0 Dataset

    • paperswithcode.com
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fangyu Lei; Jixuan Chen; Yuxiao Ye; Ruisheng Cao; Dongchan Shin; Hongjin Su; Zhaoqing Suo; Hongcheng Gao; Wenjing Hu; Pengcheng Yin; Victor Zhong; Caiming Xiong; Ruoxi Sun; Qian Liu; Sida Wang; Tao Yu (2025). Spider 2.0 Dataset [Dataset]. https://paperswithcode.com/dataset/spider-2-0
    Explore at:
    Dataset updated
    Jun 9, 2025
    Authors
    Fangyu Lei; Jixuan Chen; Yuxiao Ye; Ruisheng Cao; Dongchan Shin; Hongjin Su; Zhaoqing Suo; Hongcheng Gao; Wenjing Hu; Pengcheng Yin; Victor Zhong; Caiming Xiong; Ruoxi Sun; Qian Liu; Sida Wang; Tao Yu
    Description

    Spider 2.0 is a comprehensive code generation agent task that includes 632 examples. The agent has to interactively explore various types of databases, such as BigQuery, Snowflake, Postgres, ClickHouse, DuckDB, and SQLite. It is required to engage with complex SQL workflows, process extensive contexts, perform intricate reasoning, and generate multiple SQL queries with diverse operations, often exceeding 100 lines across multiple interactions.

  6. P

    TURSpider: A Turkish Text-to-SQL Dataset Dataset

    • paperswithcode.com
    Updated Nov 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). TURSpider: A Turkish Text-to-SQL Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/turspider
    Explore at:
    Dataset updated
    Nov 20, 2024
    Description

    TURSpider is a novel Turkish Text-to-SQL dataset that includes complex queries, akin to those in the original Spider dataset. TURSpider dataset comprises two main subsets: a dev set and a training set, aligned with the structure and scale of the popular Spider dataset. The dev set contains 1034 data rows with 1023 unique questions and 584 distinct SQL queries. In the training set, there are 8659 data rows, 8506 unique questions, and corresponding SQL queries.

  7. 4

    SpiderDec, the decomposed version of the Spider dev data set

    • data.4tu.nl
    zip
    Updated Sep 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sara Salimzadeh; Ujwal Gadiraju; Claudia Hauff; Arie Van Deursen (2022). SpiderDec, the decomposed version of the Spider dev data set [Dataset]. http://doi.org/10.4121/66ab9ab1-a08a-4c63-bd92-d11e2c3c06f8.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 5, 2022
    Dataset provided by
    4TU.ResearchData
    Authors
    Sara Salimzadeh; Ujwal Gadiraju; Claudia Hauff; Arie Van Deursen
    License

    https://www.apache.org/licenses/LICENSE-2.0.htmlhttps://www.apache.org/licenses/LICENSE-2.0.html

    Description

    SpiderDec is an extension of the Spider Dataset. The original Spider dataset split the data into training, development, and a hidden test set. For this new dataset, we manually decomposed the questions and corresponding queries within the development set of the Spider dataset, focusing on those with hard and extra hard SQL queries. The result of this effort is the creation of SpiderDec.

  8. h

    spider-sql-prompts

    • huggingface.co
    Updated Jun 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chapaneri (2024). spider-sql-prompts [Dataset]. https://huggingface.co/datasets/radhikachapaneri/spider-sql-prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 30, 2024
    Authors
    Chapaneri
    Description

    radhikachapaneri/spider-sql-prompts dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. P

    MultiSpider Dataset

    • paperswithcode.com
    Updated Jun 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Longxu Dou; Yan Gao; Mingyang Pan; Dingzirui Wang; Wanxiang Che; Dechen Zhan; Jian-Guang Lou (2024). MultiSpider Dataset [Dataset]. https://paperswithcode.com/dataset/multispider
    Explore at:
    Dataset updated
    Jun 4, 2024
    Authors
    Longxu Dou; Yan Gao; Mingyang Pan; Dingzirui Wang; Wanxiang Che; Dechen Zhan; Jian-Guang Lou
    Description

    MultiSpider is a large multilingual text-to-SQL dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese).

  10. Data from: Text to SQL dataset

    • kaggle.com
    Updated Jul 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Nour Alawad (2024). Text to SQL dataset [Dataset]. https://www.kaggle.com/datasets/mohammadnouralawad/spider-text-sql
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 21, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohammad Nour Alawad
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset consists of 8,034 entries designed to evaluate the performance of text-to-SQL models. Each entry contains a natural language text query and its corresponding SQL command. The dataset is a subset derived from the Spider dataset, focusing on diverse and complex queries to challenge the understanding and generation capabilities of machine learning models.

  11. o

    Text-to-SQL Model Evaluation Data

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Text-to-SQL Model Evaluation Data [Dataset]. https://www.opendatabay.com/data/science-research/03ab3b68-bf0d-44b9-a1cb-cab6072881bb
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    Data Science and Analytics
    Description

    This dataset comprises 8,034 entries designed to assess the performance of text-to-SQL models. Each entry includes a natural language text query and its corresponding SQL command. It is a subset derived from the Spider dataset, focusing on diverse and complex queries to challenge machine learning models' understanding and generation capabilities. This is a free dataset, ideal for data science and analytics, particularly in natural language processing and deep learning applications.

    Columns

    • text_query: This column contains natural language queries in text format.
    • sql_command: This column contains the corresponding SQL commands generated from the text queries.

    Distribution

    The dataset consists of 8,034 entries. The text_query column features 7,990 unique values, whilst the sql_command column contains 4,525 unique values. Data files are typically provided in CSV format, and a sample file will be updated separately to the platform. The dataset is currently at version 1.0.

    Usage

    This dataset is ideal for evaluating the performance of text-to-SQL models. It can be utilised to challenge and enhance the understanding and generation capabilities of various machine learning models, especially within the domains of natural language processing and deep learning research and development.

    Coverage

    The dataset's regional scope is global. The listing date for this dataset is noted as 08/06/2025.

    License

    CC-BY-SA

    Who Can Use It

    This dataset is intended for: * Machine learning researchers and developers who aim to train, test, or validate text-to-SQL models. * Professionals and academics working in natural language processing (NLP) and deep learning. * Data scientists and analysts focused on building and evaluating artificial intelligence models for natural language understanding and SQL generation.

    Dataset Name Suggestions

    • Text to SQL Dataset
    • Natural Language to SQL Commands
    • SQL Command Generation Dataset
    • Text-to-SQL Model Evaluation Data

    Attributes

    Original Data Source: Text to SQL dataset

  12. o

    Contextual Portuguese Text2SQL

    • opendatabay.com
    .undefined
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Contextual Portuguese Text2SQL [Dataset]. https://www.opendatabay.com/data/ai-ml/e4213f60-3136-497b-a7ac-09504fbd0b79
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 7, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    Data Science and Analytics
    Description

    This dataset is a Portuguese-translated version of the b-mc2/sql-create-context dataset, constructed from the WikiSQL and Spider datasets. It contains examples of questions in Portuguese, SQL CREATE TABLE statements, and SQL queries that answer the questions using the CREATE TABLE statement as context. The main goal of this dataset is to assist Portuguese natural language models in generating precise and contextualised SQL queries, preventing the hallucination of column and table names, a common issue in text-to-SQL datasets. By providing only the CREATE TABLE statement as context, the dataset aims to better ground the models without the need to provide actual data rows, limiting token use and exposure to private, sensitive, or proprietary data.

    Columns

    • pergunta: The question in natural language about the database, in Portuguese.
    • contexto: The SQL CREATE TABLE statement that provides the necessary context to answer the question, representing the schema or structure of the database tables, in Portuguese.
    • resposta: The SQL query that answers the question based on the provided context, in Portuguese.

    Distribution

    This dataset consists of 78,577 entries. Each entry represents a question about a database, the context of the database schema, and the corresponding SQL query. Data files are typically in CSV format. The 'pergunta' column contains 78,220 unique values, 'contexto' has 72,947 unique values, and 'resposta' has 78,577 unique values.

    Usage

    This dataset is ideal for: * Training natural language models for SQL query generation, especially in scenarios where accuracy in naming columns and tables is crucial. * Enhancing model performance in text-to-SQL tasks. * Supporting natural language processing and machine learning tasks related to generating structured queries from natural language.

    Coverage

    The dataset has a global region scope and focuses on the Portuguese language. The questions were translated into Portuguese using the facebook/nllb-200-distilled-1.3B model. It was listed on 22/06/2025.

    License

    CC-BY-NC

    Who Can Use It

    This dataset is suitable for: * Data scientists and analysts focused on developing and refining natural language processing models. * Researchers and developers working on text-to-SQL solutions. * Anyone aiming to build or improve AI models that translate natural language queries into SQL, particularly for Portuguese.

    Dataset Name Suggestions

    • Portuguese Text2SQL Database
    • NL to SQL Portuguese Dataset
    • SQL Query Generation from Portuguese Text
    • Portuguese Natural Language to SQL
    • Contextual Portuguese Text2SQL

    Attributes

    Original Data Source: Portuguese Text2SQL database

  13. h

    spider-context-validation

    • huggingface.co
    Updated Jul 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard R. (2023). spider-context-validation [Dataset]. https://huggingface.co/datasets/richardr1126/spider-context-validation
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 26, 2023
    Authors
    Richard R.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Spider Context Validation

      Dataset Summary
    

    Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. This dataset was created to validate spider-fine-tuned LLMs with database context.

      Yale Lily Spider Leaderboards
    

    The leaderboard can be seen at https://yale-lily.github.io/spider… See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-context-validation.

  14. h

    ORPRO-Spider-SQL-Filtered

    • huggingface.co
    Updated Jan 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maurice (2025). ORPRO-Spider-SQL-Filtered [Dataset]. https://huggingface.co/datasets/mjerome89/ORPRO-Spider-SQL-Filtered
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 9, 2025
    Authors
    Maurice
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    mjerome89/ORPRO-Spider-SQL-Filtered dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. P

    SQL-Eval Dataset

    • paperswithcode.com
    Updated Nov 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). SQL-Eval Dataset [Dataset]. https://paperswithcode.com/dataset/sql-eval-1
    Explore at:
    Dataset updated
    Nov 15, 2024
    Description

    SQL-Eval is an open-source PostgreSQL evaluation dataset released by Defog, constructed based on Spider. The original link can be found at https://github.com/defog-ai/sql-eval. Our evaluation methodology is more stringent, as it compares the execution accuracy of the predicted SQL queries against the sole ground truth SQL query.

  16. h

    fixed_spider

    • huggingface.co
    Updated Jun 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turbular (2024). fixed_spider [Dataset]. https://huggingface.co/datasets/Turbular/fixed_spider
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 14, 2024
    Dataset authored and provided by
    Turbular
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Cleaned Spider Dataset for Text2SQL

      Dataset Summary
    

    The Cleaned Spider Dataset for Text2SQL is an improved version of the original Spider dataset, which is a large-scale, complex, and cross-domain semantic parsing and text-to-SQL dataset. This enhanced version addresses several critical issues found in the original dataset, ensuring higher quality and reliability for training text-to-SQL models. The enhancements were made possible through Turbular's advanced data… See the full description on the dataset page: https://huggingface.co/datasets/Turbular/fixed_spider.

  17. h

    spider-corpus-test

    • huggingface.co
    Updated Sep 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TARGET Benchmark (2024). spider-corpus-test [Dataset]. https://huggingface.co/datasets/target-benchmark/spider-corpus-test
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2024
    Authors
    TARGET Benchmark
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Link to original dataset: https://yale-lily.github.io/spider Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., Roman, S. and Zhang, Z., 2018. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887.

  18. h

    sql-create-context-copy

    • huggingface.co
    Updated Apr 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philipp Schmid (2023). sql-create-context-copy [Dataset]. https://huggingface.co/datasets/philschmid/sql-create-context-copy
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2023
    Authors
    Philipp Schmid
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fork of b-mc2/sql-create-context

      Overview
    

    This dataset builds from WikiSQL and Spider. There are 78,577 examples of natural language queries, SQL CREATE TABLE statements, and SQL Query answering the question using the CREATE statement as context. This dataset was built with text-to-sql LLMs in mind, intending to prevent hallucination of column and table names often seen when trained on text-to-sql datasets. The CREATE TABLE statement can often be copy and pasted from… See the full description on the dataset page: https://huggingface.co/datasets/philschmid/sql-create-context-copy.

  19. h

    spider-clean-text-to-sql-4

    • huggingface.co
    Updated Jun 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victor Dubus-Chanson (2024). spider-clean-text-to-sql-4 [Dataset]. https://huggingface.co/datasets/VictorDCh/spider-clean-text-to-sql-4
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2024
    Authors
    Victor Dubus-Chanson
    Description

    VictorDCh/spider-clean-text-to-sql-4 dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    spider-en-pt-es-fr

    • huggingface.co
    Updated Jan 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcelo Archanjo Jose (2024). spider-en-pt-es-fr [Dataset]. https://huggingface.co/datasets/Marchanjo/spider-en-pt-es-fr
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 16, 2024
    Authors
    Marcelo Archanjo Jose
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Distributed under the Creative Commons-by-sa-4.0 respecting the ShareAlike of the Spider Dataset. Code explanations and links for the model's checkpoints and datasets are on Github mRAT-SQL Here is the Hugging Face collection, you can download the model's checkpoints and datasets, but to understand is better to go to Github mRAT-SQL.

      mRAT-SQL-FIT
    
    
    
    
    
    
    
      A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention
    

    Marcelo Archanjo Jose, Fabio… See the full description on the dataset page: https://huggingface.co/datasets/Marchanjo/spider-en-pt-es-fr.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
XLang NLP Lab (2021). spider [Dataset]. https://huggingface.co/datasets/xlangai/spider

spider

Spider

xlangai/spider

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset authored and provided by
XLang NLP Lab
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset Card for Spider

  Dataset Summary

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.

  Supported Tasks and Leaderboards

The leaderboard can be seen at https://yale-lily.github.io/spider

  Languages

The text in the dataset is in English.

  Dataset Structure





  Data… See the full description on the dataset page: https://huggingface.co/datasets/xlangai/spider.
Search
Clear search
Close search
Google apps
Main menu