100+ datasets found
  1. Data from: Text to SQL dataset

    • kaggle.com
    Updated Jul 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Nour Alawad (2024). Text to SQL dataset [Dataset]. https://www.kaggle.com/datasets/mohammadnouralawad/spider-text-sql
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 21, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohammad Nour Alawad
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset consists of 8,034 entries designed to evaluate the performance of text-to-SQL models. Each entry contains a natural language text query and its corresponding SQL command. The dataset is a subset derived from the Spider dataset, focusing on diverse and complex queries to challenge the understanding and generation capabilities of machine learning models.

  2. Bike Store Relational Database | SQL

    • kaggle.com
    zip
    Updated Aug 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dillon Myrick (2023). Bike Store Relational Database | SQL [Dataset]. https://www.kaggle.com/datasets/dillonmyrick/bike-store-sample-database
    Explore at:
    zip(94412 bytes)Available download formats
    Dataset updated
    Aug 21, 2023
    Authors
    Dillon Myrick
    Description

    This is the sample database from sqlservertutorial.net. This is a great dataset for learning SQL and practicing querying relational databases.

    Database Diagram:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4146319%2Fc5838eb006bab3938ad94de02f58c6c1%2FSQL-Server-Sample-Database.png?generation=1692609884383007&alt=media" alt="">

    Terms of Use

    The sample database is copyrighted and cannot be used for commercial purposes. For example, it cannot be used for the following but is not limited to the purposes: - Selling - Including in paid courses

  3. NL2SQL_Query_Dataset

    • kaggle.com
    zip
    Updated Dec 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suresh Muthusamy P (2023). NL2SQL_Query_Dataset [Dataset]. https://www.kaggle.com/datasets/sureshmuthusamy001p/nl2sql-query-dataset
    Explore at:
    zip(231382 bytes)Available download formats
    Dataset updated
    Dec 22, 2023
    Authors
    Suresh Muthusamy P
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset is designed for training models to convert natural language prompts into SQL queries, specifically focusing on SELECT statements. The dataset comprises 14,815 examples where each prompt is associated with the corresponding SQL query that would retrieve the desired information from a specific table.

    Columns: Prompt: The natural language text representing a query request. SQL Query: The corresponding SQL query generated to fulfill the request.

  4. Housing - SQL Project

    • kaggle.com
    Updated Jun 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ann Truong (2023). Housing - SQL Project [Dataset]. https://www.kaggle.com/datasets/bvanntruong/housing-sql-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ann Truong
    Description

    This dataset contains information about housing sales in Nashville, TN such as property, owner, sales, and tax information. The SQL queries I created for Data Cleaning can be found here.

  5. (Sunset)📒 Meta Kaggle ported to MS SQL SERVER

    • kaggle.com
    zip
    Updated Mar 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2024). (Sunset)📒 Meta Kaggle ported to MS SQL SERVER [Dataset]. https://www.kaggle.com/datasets/bwandowando/meta-kaggle-ported-to-sql-server-2022-database
    Explore at:
    zip(8635902534 bytes)Available download formats
    Dataset updated
    Mar 20, 2024
    Authors
    BwandoWando
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    I've always wanted to explore Kaggle's Meta Kaggle dataset but I am more comfortable on using TSQL when it comes to writing (very) complex queries. Also, I tend to write queries faster when using SQL MANAGEMENT STUDIO, like 100x faster. So, I ported Kaggle's Meta Kaggle dataset into MS SQL SERVER 2022 database format, created a backup file, then uploaded it here.

    • MSSQL VERSION: SQL Server 2022
    • Collation: SQL_Latin1_General_CP1_CI_AS
    • Recovery model: simple

    Requirements

    • Download and install the SQL SERVER 2022 Developer edition here
    • Download the backup file
    • Restore the backup file into your local. If you havent done this before, it's easy and straightforward. Here is a guide.

    (QUOTED FROM THE ORIGINAL DATASET)

    Meta Kaggle

    Explore Kaggle's public data on competitions, datasets, kernels (code/ notebooks) and more Meta Kaggle may not be the Rosetta Stone of data science, but they think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F2ad97bce7839d6e57674e7a82981ed23%2F2Egeb8R.png?generation=1688912953875842&alt=media" alt="">

    Notes

  6. SQL Injection dataset

    • kaggle.com
    zip
    Updated Jun 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayah Khaldi (2024). SQL Injection dataset [Dataset]. https://www.kaggle.com/datasets/ayahkhaldi/sql-injection-dataset
    Explore at:
    zip(26223784 bytes)Available download formats
    Dataset updated
    Jun 18, 2024
    Authors
    Ayah Khaldi
    Description

    SQL Injection Dataset:

    A cleaned SQL injection dataset, sourced from multiple Kaggle datasets, has been cleaned and split into training, validation, and testing subsets with a 6:2:2 ratio. This dataset is intended for use in research focused on detecting SQL injection attacks.

    Kaggle datasets:
    The dataset contains two columns:
    1. Query: This column represents the SQL query.
    2. Label: This column represents the label for the SQL injection binary classification, where 1 indicates an SQL injection and 0 indicates a non-SQL injection.

      https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20832724%2Fc0d648e947cec02e35beb665b24b5bdb%2Fsql-injection-datasets.png?generation=1718710914956418&alt=media" alt="">

  7. WikiSQL (Questions and SQL Queries)

    • kaggle.com
    zip
    Updated Nov 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). WikiSQL (Questions and SQL Queries) [Dataset]. https://www.kaggle.com/datasets/thedevastator/dataset-for-developing-natural-language-interfac
    Explore at:
    zip(21491264 bytes)Available download formats
    Dataset updated
    Nov 25, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    WikiSQL (Questions and SQL Queries)

    80654 hand-annotated questions and SQL queries on 24241 Wikipedia tables

    By Huggingface Hub [source]

    About this dataset

    A large crowd-sourced dataset for developing natural language interfaces for relational databases. WikiSQL is a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables from Wikipedia.

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset can be used to develop natural language interfaces for relational databases. The data fields are the same among all splits, and the file contains information on the phase, question, table, and SQL for each interface

    Research Ideas

    • This dataset can be used to develop natural language interfaces for relational databases.
    • This dataset can be used to develop a knowledge base of common SQL queries.
    • This dataset can be used to generate a training set for a neural network that translates natural language into SQL queries

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: validation.csv | Column name | Description | |:--------------|:---------------------------------------------------------| | phase | The phase of the data collection. (String) | | question | The question asked by the user. (String) | | table | The table containing the data for the question. (String) | | sql | The SQL query corresponding to the question. (String) |

    File: train.csv | Column name | Description | |:--------------|:---------------------------------------------------------| | phase | The phase of the data collection. (String) | | question | The question asked by the user. (String) | | table | The table containing the data for the question. (String) | | sql | The SQL query corresponding to the question. (String) |

    File: test.csv | Column name | Description | |:--------------|:---------------------------------------------------------| | phase | The phase of the data collection. (String) | | question | The question asked by the user. (String) | | table | The table containing the data for the question. (String) | | sql | The SQL query corresponding to the question. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  8. E-commerce dataset by Olist (SQLite)

    • kaggle.com
    zip
    Updated Apr 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Terenci Claramunt (2024). E-commerce dataset by Olist (SQLite) [Dataset]. https://www.kaggle.com/datasets/terencicp/e-commerce-dataset-by-olist-as-an-sqlite-database
    Explore at:
    zip(51085670 bytes)Available download formats
    Dataset updated
    Apr 28, 2024
    Authors
    Terenci Claramunt
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    I imported the two Olist Kaggle datasets into an SQLite database. I modified the original table names to make them shorter and easier to understand. Here's the Entity-Relationship Diagram of the resulting SQLite database:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2473556%2F23a7d4d8cd99e36e32e57303eb804fff%2Fdb-schema.png?generation=1714391550829633&alt=media" alt="Database Schema">

    Data sources:

    https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce

    https://www.kaggle.com/datasets/olistbr/marketing-funnel-olist


    I used this database as a data source for my notebook:

    SQL Challenge: E-commerce data analysis

  9. SQL Case Study for Data Analysts

    • kaggle.com
    zip
    Updated Jan 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ShravyaShetty1 (2025). SQL Case Study for Data Analysts [Dataset]. https://www.kaggle.com/datasets/shravyashetty1/sql-basic-case-study
    Explore at:
    zip(59519 bytes)Available download formats
    Dataset updated
    Jan 29, 2025
    Authors
    ShravyaShetty1
    Description

    This dataset is a practical SQL case study designed for learners who are looking to enhance their SQL skills in analyzing sales, products, and marketing data. It contains several SQL queries related to a simulated business database for product sales, marketing expenses, and location data. The database consists of three main tables: Fact, Product, and Location.

    Objective of the Case Study: The purpose of this case study is to provide learners with a variety of practical SQL exercises that involve real-world business problems. The queries explore topics such as:

    • Aggregating data (e.g., sum, count, average)
    • Filtering and sorting data
    • Grouping and joining multiple tables
    • Using SQL functions like AVG(), COUNT(), SUM(), and MIN/MAX()
    • Handling advanced SQL features such as row numbering, transactions, and stored procedures
  10. IMDB Movies Analysis - SQL

    • kaggle.com
    zip
    Updated Feb 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaurav B R (2023). IMDB Movies Analysis - SQL [Dataset]. https://www.kaggle.com/datasets/gauravbr/imdb-movies-data-erd
    Explore at:
    zip(3818401 bytes)Available download formats
    Dataset updated
    Feb 21, 2023
    Authors
    Gaurav B R
    Description

    SQL IMDB Movies Analysis for RSVP (Film Production Company)

    RSVP Movies is an Indian film production company which has produced many super-hit movies. They have usually released movies for the Indian audience but for their next project, they are planning to release a movie for the global audience in 2022.

    The production company wants to plan their every move analytically based on data. We have taken the last three years IMDB movies data and carried out the analysis using SQL. We have analysed the data set and drew meaningful insights that could help them start their new project.

    For our convenience, the entire analytics process has been divided into four segments, where each segment leads to significant insights from different combinations of tables. The questions in each segment with business objectives are written in the script given below. We have written the solution code below every question.

  11. 🖼️ Famous Paintings

    • kaggle.com
    zip
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2023). 🖼️ Famous Paintings [Dataset]. https://www.kaggle.com/datasets/mexwell/famous-paintings
    Explore at:
    zip(6681482 bytes)Available download formats
    Dataset updated
    Oct 5, 2023
    Authors
    mexwell
    Description

    Famous paintings and their artists. This data set is published to help students have interesting data to practice SQL

    Original Data

    Acknowlegement

    Foto von Steve Johnson auf Unsplash

  12. PL/SQL

    • kaggle.com
    zip
    Updated Jan 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodrigo Fontanella (2023). PL/SQL [Dataset]. https://www.kaggle.com/datasets/rodrigofontanella/modelo-fisico
    Explore at:
    zip(10685 bytes)Available download formats
    Dataset updated
    Jan 16, 2023
    Authors
    Rodrigo Fontanella
    Description

    Dataset

    This dataset was created by Rodrigo Fontanella

    Contents

  13. Hospital Database Management System SQL Project

    • kaggle.com
    zip
    Updated May 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Dolcimascolo-Garrett (2024). Hospital Database Management System SQL Project [Dataset]. https://www.kaggle.com/datasets/andrewdolcigarrett/hospital-database-management-system-sql-project
    Explore at:
    zip(1487278 bytes)Available download formats
    Dataset updated
    May 9, 2024
    Authors
    Andrew Dolcimascolo-Garrett
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Andrew Dolcimascolo-Garrett

    Released under MIT

    Contents

  14. SQL Query Examples

    • kaggle.com
    zip
    Updated May 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Nowell (2024). SQL Query Examples [Dataset]. https://www.kaggle.com/datasets/michaelnowell/sql-query-examples
    Explore at:
    zip(1236106 bytes)Available download formats
    Dataset updated
    May 6, 2024
    Authors
    Michael Nowell
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by Michael Nowell

    Released under Database: Open Database, Contents: Database Contents

    Contents

  15. Data from: text-to-sql

    • kaggle.com
    zip
    Updated Jun 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yentür (2024). text-to-sql [Dataset]. https://www.kaggle.com/datasets/meryentr/text-to-sql
    Explore at:
    zip(2533086 bytes)Available download formats
    Dataset updated
    Jun 11, 2024
    Authors
    Yentür
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Yentür

    Released under MIT

    Contents

  16. Text2SQL Dataset

    • kaggle.com
    zip
    Updated Mar 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarakanta Acharya (2025). Text2SQL Dataset [Dataset]. https://www.kaggle.com/datasets/tarakantaacharya/text2sql-dataset
    Explore at:
    zip(12165 bytes)Available download formats
    Dataset updated
    Mar 30, 2025
    Authors
    Tarakanta Acharya
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Description

    🔍 Overview

    This dataset is built for Text-to-SQL (NL → SQL) tasks, helping train models to convert natural language into SQL queries. It is ideal for fine-tuning LLMs, developing AI-powered database assistants, and improving SQL query generation accuracy.

    📂 Dataset Structure

    Each row contains the following fields:
    - 📝 Instruction – A natural language query (e.g., "Find all customers who placed an order in the last 30 days.")
    - 📊 Query – The corresponding SQL statement (e.g., SELECT * FROM orders WHERE order_date >= DATE_SUB(NOW(), INTERVAL 30 DAY);)
    - 🗄️ Database – Contains metadata such as:
    - Table Names – The relevant tables for the query (e.g., orders, customers)
    - Column Names – The specific fields used in the query (e.g., order_date, customer_id)

    🚀 Use Cases

    • Fine-tuning Large Language Models (LLMs) for SQL generation
    • Training AI chatbots to assist with SQL query building
    • Developing database assistants for automated SQL generation
    • Enhancing Retrieval-Augmented Generation (RAG) for SQL-based applications
  17. sql dataset

    • kaggle.com
    zip
    Updated Nov 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ada Luo daa (2025). sql dataset [Dataset]. https://www.kaggle.com/datasets/adaluodaa/sql-dataset
    Explore at:
    zip(296364091 bytes)Available download formats
    Dataset updated
    Nov 9, 2025
    Authors
    Ada Luo daa
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Ada Luo daa

    Released under Apache 2.0

    Contents

  18. Employee dataset table

    • kaggle.com
    zip
    Updated Jun 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vishnusan (2024). Employee dataset table [Dataset]. https://www.kaggle.com/datasets/vishnusan/employee-dataset-table
    Explore at:
    zip(70503 bytes)Available download formats
    Dataset updated
    Jun 23, 2024
    Authors
    vishnusan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by vishnusan

    Released under Apache 2.0

    Contents

  19. SQL project data set

    • kaggle.com
    zip
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parth Mistry 20 (2024). SQL project data set [Dataset]. https://www.kaggle.com/datasets/parthmistry20/sql-project-data-set
    Explore at:
    zip(205161 bytes)Available download formats
    Dataset updated
    Sep 25, 2024
    Authors
    Parth Mistry 20
    Description

    Dataset

    This dataset was created by Parth Mistry 20

    Contents

  20. SQL Injection Dataset

    • kaggle.com
    Updated Aug 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    rayten (2024). SQL Injection Dataset [Dataset]. https://www.kaggle.com/datasets/rayten/sql-injection-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 5, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    rayten
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by rayten

    Released under Apache 2.0

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mohammad Nour Alawad (2024). Text to SQL dataset [Dataset]. https://www.kaggle.com/datasets/mohammadnouralawad/spider-text-sql
Organization logo

Data from: Text to SQL dataset

A Dataset for Evaluating Text-to-SQL Systems

Related Article
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 21, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mohammad Nour Alawad
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

This dataset consists of 8,034 entries designed to evaluate the performance of text-to-SQL models. Each entry contains a natural language text query and its corresponding SQL command. The dataset is a subset derived from the Spider dataset, focusing on diverse and complex queries to challenge the understanding and generation capabilities of machine learning models.

Search
Clear search
Close search
Google apps
Main menu