Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Card for Spider
Dataset Summary
Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.
Supported Tasks and Leaderboards
The leaderboard can be seen at https://yale-lily.github.io/spider
Languages
The text in the dataset is in English.
Dataset Structure
Dataโฆ See the full description on the dataset page: https://huggingface.co/datasets/xlangai/spider.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for Spider-Releastic
This dataset variant contains only the Spider Realistic dataset used in "Structure-Grounded Pretraining for Text-to-SQL". The dataset is created based on the dev split of the Spider dataset (2020-06-07 version from https://yale-lily.github.io/spider). The authors of the dataset modified the original questions to remove the explicit mention of column names while keeping the SQL queries unchanged to better evaluate the model's capability in aligningโฆ See the full description on the dataset page: https://huggingface.co/datasets/aherntech/spider-realistic.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for Spider Schema
Dataset Summary
Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. This dataset contains the 166 databases used in the Spider dataset.
Yale Lily Spider Leaderboards
The leaderboard can be seen at https://yale-lily.github.io/spider
Languages
The text inโฆ See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-schema.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Distributed under the Creative Commons-by-sa-4.0 respecting the ShareAlike of the Spider Dataset. Code explanations and links for the model's checkpoints and datasets are on Github mRAT-SQL Here is the Hugging Face collection, you can download the model's checkpoints and datasets, but to understand is better to go to Github mRAT-SQL.
mRAT-SQL-FIT
A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention
Marcelo Archanjo Jose, Fabioโฆ See the full description on the dataset page: https://huggingface.co/datasets/Marchanjo/spider-en-pt-es-fr.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Cleaned Spider Dataset for Text2SQL
Dataset Summary
The Cleaned Spider Dataset for Text2SQL is an improved version of the original Spider dataset, which is a large-scale, complex, and cross-domain semantic parsing and text-to-SQL dataset. This enhanced version addresses several critical issues found in the original dataset, ensuring higher quality and reliability for training text-to-SQL models. The enhancements were made possible through Turbular's advanced dataโฆ See the full description on the dataset page: https://huggingface.co/datasets/Turbular/fixed_spider.
radhikachapaneri/spider-sql dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for Spider Skeleton Context Instruct
Dataset Summary
Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. This dataset was created to finetune LLMs in a ### Instruction: and ### Response: format with database context.
Yale Lily Spider Leaderboards
The leaderboard can be seen atโฆ See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-skeleton-context-instruct.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a large publicly available multi-center lumbar spine magnetic resonance imaging (MRI) dataset with reference segmentations of vertebrae, intervertebral discs (IVDs), and spinal canal. The dataset includes 447 sagittal T1 and T2 MRI series from 218 studies of 218 patients with a history of low back pain. The data was collected from four different hospitals. There is an additional hidden test set, not available here, used in the accompanying SPIDER challenge on spider.grand-challenge.org. We share this data to encourage wider participation and collaboration in the field of spine segmentation, and ultimately improve the diagnostic value of lumbar spine MRI.
This file also provides the biological sex for all patients and the age for the patients for which this was available. It also includes a number of scanner and acquisition parameters for each individual MRI study. The dataset also comes with radiological gradings found in a separate file for the following degenerative changes:
1.โโโโModic changes (type I, II or III)
2.โโโโUpper and lower endplate changes / Schmorl nodes (binary)
3.โโโโSpondylolisthesis (binary)
4.โโโโDisc herniation (binary)
5.โโโโDisc narrowing (binary)
6.โโโโDisc bulging (binary)
7.โโโโPfirrman grade (grade 1 to 5).
All radiological gradings are provided per IVD level.
Repository: https://zenodo.org/records/10159290 Paper: https://www.nature.com/articles/s41597-024-03090-w
Link to original dataset: https://yale-lily.github.io/spider Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., Roman, S. and Zhang, Z., 2018. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
hyess/spider dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset builds from WikiSQL and Spider. There are 78,577 examples of natural language queries, SQL CREATE TABLE statements, and SQL Query answering the question using the CREATE statement as context. This dataset was built with text-to-sql LLMs in mind, intending to prevent hallucination of column and table names often seen when trained on text-to-sql datasets. The CREATE TABLE statement can often be copy and pasted from different DBMS and provides table names, column names and their data types. By providing just the CREATE TABLE statement as context, we can hopefully provide better grounding for models without having to provide actual rows of data, limiting token usage and exposure to private, sensitive, or proprietary data.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Card for spider-ko: ํ๊ตญ์ด Text-to-SQL ๋ฐ์ดํฐ์
๋ฐ์ดํฐ์
์์ฝ
Spider-KO๋ Yale University์ Spider ๋ฐ์ดํฐ์ ์ ํ๊ตญ์ด๋ก ๋ฒ์ญํ ํ ์คํธ-SQL ๋ณํ ๋ฐ์ดํฐ์ ์ ๋๋ค. ์๋ณธ Spider ๋ฐ์ดํฐ์ ์ ์์ฐ์ด ์ง๋ฌธ์ ํ๊ตญ์ด๋ก ๋ฒ์ญํ์ฌ ๊ตฌ์ฑํ์์ต๋๋ค. ์ด ๋ฐ์ดํฐ์ ์ ๋ค์ํ ๋๋ฉ์ธ์ ๋ฐ์ดํฐ๋ฒ ์ด์ค์ ๋ํ ์ง์์ ํด๋น SQL ์ฟผ๋ฆฌ๋ฅผ ํฌํจํ๊ณ ์์ผ๋ฉฐ, ํ๊ตญ์ด Text-to-SQL ๋ชจ๋ธ ๊ฐ๋ฐ ๋ฐ ํ๊ฐ์ ํ์ฉ๋ ์ ์์ต๋๋ค.
์ง์ ํ์คํฌ ๋ฐ ๋ฆฌ๋๋ณด๋
text-to-sql: ํ๊ตญ์ด ์์ฐ์ด ์ง๋ฌธ์ SQL ์ฟผ๋ฆฌ๋ก ๋ณํํ๋ ํ์คํฌ์ ์ฌ์ฉ๋ฉ๋๋ค.
์ธ์ด
๋ฐ์ดํฐ์ ์ ์ง๋ฌธ์ ํ๊ตญ์ด(ko)๋ก ๋ฒ์ญ๋์์ผ๋ฉฐ, SQL ์ฟผ๋ฆฌ๋ ์์ด ๊ธฐ๋ฐ์ผ๋ก ์ ์ง๋์์ต๋๋ค. ์๋ณธ ์์ด ์ง๋ฌธ๋ ํจ๊ป ์ ๊ณต๋ฉ๋๋ค.
๋ฐ์ดํฐ์
๊ตฌ์กฐ
๋ฐ์ดํฐ ํ๋
db_idโฆ See the full description on the dataset page: https://huggingface.co/datasets/huggingface-KREW/spider-ko.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for Spider Context Validation
Dataset Summary
Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. This dataset was created to validate spider-fine-tuned LLMs with database context.
Yale Lily Spider Leaderboards
The leaderboard can be seen at https://yale-lily.github.io/spiderโฆ See the full description on the dataset page: https://huggingface.co/datasets/richardr1126/spider-context-validation.
simone-papicchio/spider dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
nanina1/spider dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Distributed under the Creative Commons-by-sa-4.0 respecting the ShareAlike of the Spider Dataset. Code explanations and links for the model's checkpoints and datasets are on Github mRAT-SQL Here is the Hugging Face collection, you can download the model's checkpoints and datasets, but to understand is better to go to Github mRAT-SQL.
mRAT-SQL-FIT
A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention
Marcelo Archanjo Jose, Fabioโฆ See the full description on the dataset page: https://huggingface.co/datasets/Marchanjo/spider-FIT-en-enr-enb.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for Spider
Dataset Summary
Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases
Supported Tasks and Leaderboards
The leaderboard can be seen at https://yale-lily.github.io/spider
Languages
The text in the dataset is in English.
Dataset Structure
Dataโฆ See the full description on the dataset page: https://huggingface.co/datasets/HusnaManakkot/new-spider-HM.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
mjerome89/ORPRO-Spider-SQL-Filtered dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for "spider-tableQA"
Usage
import pandas as pd from datasets import load_dataset
spider_tableQA = load_dataset("vaishali/spider-tableQA")
for sample in spider_tableQA['train']: question = sample['question'] sql_query = sample['query'] input_table_names = sample["table_names"] input_tables = [pd.read_json(table, orient='split') for table in sample['tables']] answer = pd.read_json(sample['answer'], orient='split')
# flattened input/outputโฆ See the full description on the dataset page: https://huggingface.co/datasets/vaishali/spider-tableQA.
karlen532/spider dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Card for Spider
Dataset Summary
Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.
Supported Tasks and Leaderboards
The leaderboard can be seen at https://yale-lily.github.io/spider
Languages
The text in the dataset is in English.
Dataset Structure
Dataโฆ See the full description on the dataset page: https://huggingface.co/datasets/xlangai/spider.