11 datasets found
  1. Z

    SQL Injection Attack Netflow

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adrián Campazas (2022). SQL Injection Attack Netflow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6907251
    Explore at:
    Dataset updated
    Sep 28, 2022
    Dataset provided by
    Adrián Campazas
    Ignacio Crespo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.

    NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

    Datasets

    The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).

    The datasets contain both benign and malicious traffic. All collected datasets are balanced.

    The version of NetFlow used to build the datasets is 5.

        Dataset
        Aim
        Samples
        Benign-malicious
        traffic ratio
    
    
    
    
        D1
        Training
        400,003
        50%
    
    
        D2
        Test
        57,239
        50%
    

    Infrastructure and implementation

    Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.

    DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)

    Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).

    The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.

    The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.

        Parameters
        Description
    
    
    
    
        '--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema'
        Enumerate users, password hashes, privileges, roles, databases, tables and columns
    
    
        --level=5
        Increase the probability of a false positive identification
    
    
        --risk=3
        Increase the probability of extracting data
    
    
        --random-agent
        Select the User-Agent randomly
    
    
        --batch
        Never ask for user input, use the default behavior
    
    
        --answers="follow=Y"
        Predefined answers to yes
    

    Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).

    The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24. The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.

    However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.

    To run the MySQL server we ran MariaDB version 10.4.12. Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.

  2. P

    WikiSQL Dataset

    • paperswithcode.com
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WikiSQL Dataset [Dataset]. https://paperswithcode.com/dataset/wikisql
    Explore at:
    Authors
    Victor Zhong; Caiming Xiong; Richard Socher
    Description

    WikiSQL consists of a corpus of 87,726 hand-annotated SQL query and natural language question pairs. These SQL queries are further split into training (61,297 examples), development (9,145 examples) and test sets (17,284 examples). It can be used for natural language inference tasks related to relational databases.

  3. Z

    Stack Overflow Chat Dutch

    • data.niaid.nih.gov
    • huggingface.co
    • +1more
    Updated Jun 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vanroy, Bram (2023). Stack Overflow Chat Dutch [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8052379
    Explore at:
    Dataset updated
    Jun 20, 2023
    Dataset authored and provided by
    Vanroy, Bram
    License

    http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0

    Description

    This dataset contains 56,964 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch, specifically in the domain of programming (Stack Overflow). They are translations of Baize's machine-generated answers to the Stack Overflow dataset.

    Data Instances

    {'id': 9, 'input': 'Het gesprek tussen de mens en de AI-assistent. ' '[|Human|] Ik heb wat hulp nodig met Spark SQL union en het ordenen ' 'van kolommen. ' '[|AI|] Natuurlijk, ik help je graag. Waar heb je specifiek hulp ' 'bij nodig? ' '[|Human|] Ik wil de output van een UNION-query ordenen op een ' 'kolom die niet aanwezig is in de SELECT-instructie. Is dat ' 'mogelijk? ' '[|AI|] Ja, het is mogelijk met Spark SQL. Je kunt de ordinale ' 'positie van de kolom in de ORDER BY-clausule gebruiken in plaats ' 'van de kolomnaam. Als je bijvoorbeeld wilt ordenen op de derde ' 'kolom in de SELECT-instructie, kun je ORDER BY 3 gebruiken. ' '[|Human|] Dat is nuttig. Kan ik ook een alias gebruiken in de ' 'ORDER BY-clausule? ' '[|AI|] Ja, je kunt een alias gebruiken in de ORDER BY-clausule. De ' 'alias moet worden gedefinieerd in de SELECT-instructie. Als je ' 'bijvoorbeeld de derde kolom als "col3" alias geeft, kun je ordenen ' 'met ORDER BY col3. ' '[|Human|] Oké, ik denk dat ik het begrijp. Kan ik dezelfde alias ' 'gebruiken voor kolommen uit beide tabellen in de UNION? ' '[|AI|] Nee, je kunt dezelfde alias niet gebruiken voor kolommen ' 'uit beide tabellen in de UNION. Elke tabel in de UNION moet een ' 'unieke alias hebben voor de kolommen. ' '[|Human|] ', 'topic': 'Spark SQL UNION - ORDER BY kolom niet in SELECT'},

    Data Fields

    id: the ID of the item. The following 82 IDs are not included because they could not be translated: [1713, 1937, 1960, 4326, 4356, 8357, 8542, 8827, 9137, 9782, 11560, 11961, 12244, 12362, 12488, 13259, 13621, 14445, 14835, 15006, 17746, 18808, 19285, 19426, 19491, 21270, 21661, 22098, 23352, 23840, 23869, 25148, 25928, 27102, 27856, 28387, 29942, 30041, 30251, 32396, 32742, 32941, 33628, 34116, 34648, 34859, 35977, 35987, 36035, 36456, 37028, 37238, 37640, 38107, 38735, 39015, 40984, 41115, 41567, 42397, 43219, 43783, 44599, 44980, 45239, 47676, 48922, 49534, 50282, 50683, 50804, 50919, 51076, 51211, 52000, 52183, 52489, 52595, 53884, 54726, 55795, 56992]

    input: the machine-generated conversation between AI and "Human". Always starts with Het gesprek tussen de mens en de AI-assistent. and has at least one occurrence of both [|AI|] and [|Human|].

    topic: the topic description

    Dataset Creation

    Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

    The prompt template to translate the input is (where src_lang was English and tgt_lang Dutch):

    CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a conversation between an AI assistant and a human from {src_lang} into {tgt_lang}.

    Here are the requirements that you should adhere to: 1. maintain the format: the conversation consists of the AI (marked as [|AI|]) and the human ([|Human|]) talking in turns and responding to each other; 2. do not translate the speaker identifiers [|AI|] and [|Human|] but always copy them into the translation in appropriate places; 3. ensure accurate translation and keep the correctness of the conversation; 4. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias; 5. translate the human's text using informal, but standard, language; 6. make sure to avoid biases (such as gender bias, grammatical bias, social bias); 7. if the human asks to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in {tgt_lang}, and then also generate a corrected output version for the AI in {tgt_lang}; 8. if the human asks to translate text from one to another language, then you only translate the human's question to {tgt_lang} but you keep the translation that the AI provides in the language that the human requested; 9. do not translate code fragments but copy them as they are. If there are English examples, variable names or definitions in code fragments, keep them in English.

    Now translate the following conversation with the requirements set out above. Do not provide an explanation and do not add anything else.

    """

    The prompt to translate the topic is:

    TOPIC_TRANSLATION_PROMPT = "Translate the following title of a conversation from {src_lang} to {tgt_lang} in a succinct,"
    " summarizing manner. Translate accurately and formally. Do not provide any explanation"
    " about the translation and do not include the original title.

    "

    The system message was:

    You are a helpful assistant that translates English to Dutch to the requirements that are given to you.

    Note that 82 items (0.1%) were not successfully translated. The translation was missing the AI identifier [|AI|] and/or the human one [|Human|]. The IDs for the missing items are [1713, 1937, 1960, 4326, 4356, 8357, 8542, 8827, 9137, 9782, 11560, 11961, 12244, 12362, 12488, 13259, 13621, 14445, 14835, 15006, 17746, 18808, 19285, 19426, 19491, 21270, 21661, 22098, 23352, 23840, 23869, 25148, 25928, 27102, 27856, 28387, 29942, 30041, 30251, 32396, 32742, 32941, 33628, 34116, 34648, 34859, 35977, 35987, 36035, 36456, 37028, 37238, 37640, 38107, 38735, 39015, 40984, 41115, 41567, 42397, 43219, 43783, 44599, 44980, 45239, 47676, 48922, 49534, 50282, 50683, 50804, 50919, 51076, 51211, 52000, 52183, 52489, 52595, 53884, 54726, 55795, 56992].

    The translation quality has not been verified. Use at your own risk!

    Licensing Information

    Licensing info for Stack Overflow Questions is listed as Apache 2.0. If you use the current dataset, you should also adhere to the original license.

    This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

    If you use this dataset, you must also follow the Sharing and Usage policies.

    As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

    This dataset is also available on the Hugging Face hub with the same DOI and license. See that README for more info.

  4. h

    sql-create-context

    • huggingface.co
    • opendatalab.com
    Updated Apr 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    brianm (2023). sql-create-context [Dataset]. https://huggingface.co/datasets/b-mc2/sql-create-context
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2023
    Authors
    brianm
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset builds from WikiSQL and Spider. There are 78,577 examples of natural language queries, SQL CREATE TABLE statements, and SQL Query answering the question using the CREATE statement as context. This dataset was built with text-to-sql LLMs in mind, intending to prevent hallucination of column and table names often seen when trained on text-to-sql datasets. The CREATE TABLE statement can often be copy and pasted from different DBMS and provides table names… See the full description on the dataset page: https://huggingface.co/datasets/b-mc2/sql-create-context.

  5. IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification

    • zenodo.org
    application/gzip, zip
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan Palvich; Nima Ebadi; Richard Tarbell; Billy Linares; Adrian Tan; Rachael Humphreys; Jayanta Kumar Das; Rambod Ghandiparsi; Hannah Haley; Jerris George; Rocky Slavin; Kim-Kwan Raymond Choo; Glenn Dietrich; Anthony Rios; Ryan Palvich; Nima Ebadi; Richard Tarbell; Billy Linares; Adrian Tan; Rachael Humphreys; Jayanta Kumar Das; Rambod Ghandiparsi; Hannah Haley; Jerris George; Rocky Slavin; Kim-Kwan Raymond Choo; Glenn Dietrich; Anthony Rios (2025). IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification [Dataset]. http://doi.org/10.5281/zenodo.15000588
    Explore at:
    zip, application/gzipAvailable download formats
    Dataset updated
    Mar 10, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ryan Palvich; Nima Ebadi; Richard Tarbell; Billy Linares; Adrian Tan; Rachael Humphreys; Jayanta Kumar Das; Rambod Ghandiparsi; Hannah Haley; Jerris George; Rocky Slavin; Kim-Kwan Raymond Choo; Glenn Dietrich; Anthony Rios; Ryan Palvich; Nima Ebadi; Richard Tarbell; Billy Linares; Adrian Tan; Rachael Humphreys; Jayanta Kumar Das; Rambod Ghandiparsi; Hannah Haley; Jerris George; Rocky Slavin; Kim-Kwan Raymond Choo; Glenn Dietrich; Anthony Rios
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification

    Overview

    This dataset accompanies the paper:

    "Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats"

    Published in TrustNLP: Fifth Workshop on Trustworthy Natural Language Processing, colocated with NAACL 2025.

    This dataset is designed to facilitate research in:

    • Text-to-SQL: Generating SQL queries from natural language.
    • IoT Network Traffic Analysis: Detecting malicious activity in IoT environments.
    • Multimodal Learning: Combining structured database queries with network security classification.

    Dataset Contents

    The dataset consists of three main components:

    1. IoT Database (iot_database.sql.gz)

    • SQL schema and data from IoT-23 logs and Smart Building Sensor datasets to be put into a database.

    2. Text-to-SQL Data (text-to-SQL-data.zip)

    • Includes queries with joins, aggregations, temporal conditions, and nested clauses.
    • Data split into training (6,591), validation (2,197), and test (2,197) sets.

    3. Network Traffic Data (network_traffic_data.zip)

    • Each record labeled as benign or malicious.
    • Features include timestamps, IPs, ports, protocols, byte counts, and connection history.
    • Malicious traffic includes DDoS, C&C, and botnet-related activity.

    Usage Instructions

    Setting Up the Database

    1. Extract the database file:
      gunzip iot_database.sql.gz
      
    2. Import into MySQL:
      mysql -u 
    3. Verify the schema:
      SHOW TABLES;
      

    Citation

    If you use this dataset, please cite:

    @inproceedings{pavlich2025beyond,
     author = {Ryan Pavlich and Nima Ebadi and Richard Tarbell and Billy Linares and Adrian Tan and Rachael Humphreys and Jayanta Kumar Das and Rambod Ghandiparsi and Hannah Haley and Jerris George and Rocky Slavin and Kim-Kwang Raymond Choo and Glenn Dietrich and Anthony Rios},
     title = {Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats},
     booktitle = {TrustNLP: Fifth Workshop on Trustworthy Natural Language Processing},
     year = {2025},
     organization = {NAACL}
    }
    

    Contact

    For questions or collaborations, contact Anthony Rios at Anthony.Rios@utsa.edu.

  6. h

    odoo-sql-query-dataset

    • huggingface.co
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vinay Rana (2025). odoo-sql-query-dataset [Dataset]. https://huggingface.co/datasets/VPCSinfo/odoo-sql-query-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 12, 2025
    Authors
    Vinay Rana
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Odoo SQL Query Dataset

    This dataset contains natural language to SQL query pairs specifically for Odoo 17.0 Community Edition. It's designed to help train and fine-tune language models for generating accurate SQL queries for Odoo databases.

      Dataset Description
    
    
    
    
    
      Overview
    

    The dataset consists of 6815 carefully curated examples of natural language questions paired with their corresponding SQL queries for Odoo databases. Each example includes detailed instructions… See the full description on the dataset page: https://huggingface.co/datasets/VPCSinfo/odoo-sql-query-dataset.

  7. f

    ReCiterAnalysis.sql.

    • plos.figshare.com
    txt
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul J. Albert; Sarbajit Dutta; Jie Lin; Zimeng Zhu; Michael Bales; Stephen B. Johnson; Mohammad Mansour; Drew Wright; Terrie R. Wheeler; Curtis L. Cole (2023). ReCiterAnalysis.sql. [Dataset]. http://doi.org/10.1371/journal.pone.0244641.s006
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Paul J. Albert; Sarbajit Dutta; Jie Lin; Zimeng Zhu; Michael Bales; Stephen B. Johnson; Mohammad Mansour; Drew Wright; Terrie R. Wheeler; Curtis L. Cole
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database model for storing output of Feature Generator API. Includes some sample data. (SQL)

  8. pyKinML database

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carles Martí; Carles Martí; Judit Zádor; Judit Zádor; Habib Najm; Habib Najm; Christian Devereux; Christian Devereux (2024). pyKinML database [Dataset]. http://doi.org/10.5281/zenodo.10892202
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 10, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Carles Martí; Carles Martí; Judit Zádor; Judit Zádor; Habib Najm; Habib Najm; Christian Devereux; Christian Devereux
    Description
    This is a set of SQL databases with information about molecules and radicals with the following database conventions and content.
    • Each database is named as “CnHm”, where n and m range from 1 to 5 and indicate the number of carbon (C) and hydrogen (H) atoms in the structures in the database.
    • Each database contains entries of a large number of “CnHm” geometries. Within each database are frous tables.
    Table “meta”
    Contains the name of the database and the date it was created.Table “xyz”
    The columns of the “xyz” database are the following:
    • “id”: a numerical identification number, integer
    • “calc_params”: metadata describing the level of theory and other details of the quantum chemical calculations to generate the Hessian used for generating this structure, numpy array stored as a blob.
    • “calc”: software used to for the calculation, string
    • “temp”: the temperature used during the normal mode sampling process to generate the structure in K, outliers are assigned negative temperature, float
    • “name”: a unique name describing the anchor point the structure was generated from, string
    • “dist”: the normalized unitless distance of the structure from its anchor point at the temperature given in “temp”, float
    • “geometry”: atomic coordinates in angstroms, (n+m) by 3 numpy array of floats stored as a blob. Note that the atomic positions are listed with carbons first followed by hydrogens.
    • “created_at”: date the structure was generated
    Table “energy”
    The columns of the “energy” database are the following:
    • “id”: a numerical identification number, not linked to the “id” in “xyz”, integer
    • “fidelity”: the fidelity level the energy was calculated at, integer
      • 0 = B3LYP/6-31+G(d)
      • 1 = wB97X-D/6-311++G(d,p)
      • 2 = HF/6-31G
      • 3 = B3LYP/6-31G
      • 4 = B2PLYPD3/6-311++G(d,p)
    • “E”: molecular energy in eV, float
    • “xyz_id”: the “id” of the geometry in the “xyz” table this energy calculated for, integer
    • “hessian”: empty
    • “forces”: atomic forces in eV/angstrom, (n+m) by 3 numpy array of floats stored as a blob. Note that the atomic forces are listed in the same order as the atoms in the “geometry” in the “xyz” table
    • “calc_params”: metadata describing the level of theory and other details of the energy and force calculations of this entry, numpy array stored as a blob
    • “calc”: software used to for the energy and force calculation
    • “created_at”: date the energy and forces were calculated
    • “sample_set_id”: empty
    Table “aev”: currently empty (edited)
  9. d

    Data from: Acquisition of germ plasm accelerates vertebrate evolution.

    • datadryad.org
    • explore.openaire.eu
    • +2more
    zip
    Updated Apr 9, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teri Evans; Christopher M. Wade; Frank A. Chapman; Andrew D. Johnson; Matthew Loose (2015). Acquisition of germ plasm accelerates vertebrate evolution. [Dataset]. http://doi.org/10.5061/dryad.rd70f
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 9, 2015
    Dataset provided by
    Dryad
    Authors
    Teri Evans; Christopher M. Wade; Frank A. Chapman; Andrew D. Johnson; Matthew Loose
    Time period covered
    2015
    Description

    Scripts for AnalysesThis file contains the instructions for generating the appropriate mySQL databases and the scripts required to process this data and recreate the analyses described in Evans et al.Scripts.tar.gzAlignment FilesThis file contains the folder system storing the alignments (.nex) and Maximum-Likelihood tree files (_tree.tre) for the sequences presented in Evans et al.

    Each file is named according to the seq_number information as in the database basic_db.sql (also contained within this DataDryad submission).

    Each alignment contains only the sequence from the first two codon positions as reported in Evans et. al.Alignment_files.tar.gzmySQL Database including dataThis file contains a mySQL database including all the data used in Evans et al. It is in the same format as the database which can be created using the files available in the Scripts package associated with this datadryad package.basic_db.sql

  10. Share of GenAI startups in India 2024, by segment

    • statista.com
    Updated Jul 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Share of GenAI startups in India 2024, by segment [Dataset]. https://www.statista.com/statistics/1481399/india-share-of-genai-startups-by-segment/
    Explore at:
    Dataset updated
    Jul 25, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    India
    Description

    As of July 2024, more than one-third of India's generative artificial intelligence (GenAI) startups were in the code and data segment. It was followed by audio and video segment startups at 27 percent. The code and data GenAI provides features like generating code, and documents, as well as converting text to SQL.

  11. Oracle: revenue by segment 2008-2024

    • statista.com
    • flwrdeptvarieties.store
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Oracle: revenue by segment 2008-2024 [Dataset]. https://www.statista.com/statistics/269728/oracles-revenue-by-business-segment/
    Explore at:
    Dataset updated
    Jul 1, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    Oracle’s cloud services and license support division is the company’s most profitable business segment, bringing in over 39 billion U.S. dollars in its 2024 fiscal year. In that year, Oracle brought in annual revenue of close to 52 billion U.S. dollars, its highest revenue figure to date. Oracle Corporation Oracle was founded by Larry Ellison in 1977 as a tech company primarily focused on relational databases. Today, Oracle ranks among the largest companies in the world in terms of market value and serves as the world’s most popular database management system provider. Oracle’s success is not only reflected in its booming sales figures, but also in its growing number of employees: between fiscal year 2008 and 2021, Oracle’s total employee number has grown substantially, increasing from around 84,000 to 132,000. Database market The global database market reached a size of 65 billion U.S. dollars in 2020. Database Management Systems (DBMSs) provide a platform through which developers can organize, update, and control large databases, with products like Oracle, MySQL, and Microsoft SQL Server being the most widely used in the market.

  12. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Adrián Campazas (2022). SQL Injection Attack Netflow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6907251

SQL Injection Attack Netflow

Explore at:
Dataset updated
Sep 28, 2022
Dataset provided by
Adrián Campazas
Ignacio Crespo
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Introduction

This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.

NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

Datasets

The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).

The datasets contain both benign and malicious traffic. All collected datasets are balanced.

The version of NetFlow used to build the datasets is 5.

    Dataset
    Aim
    Samples
    Benign-malicious
    traffic ratio




    D1
    Training
    400,003
    50%


    D2
    Test
    57,239
    50%

Infrastructure and implementation

Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.

DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)

Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).

The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.

The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.

    Parameters
    Description




    '--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema'
    Enumerate users, password hashes, privileges, roles, databases, tables and columns


    --level=5
    Increase the probability of a false positive identification


    --risk=3
    Increase the probability of extracting data


    --random-agent
    Select the User-Agent randomly


    --batch
    Never ask for user input, use the default behavior


    --answers="follow=Y"
    Predefined answers to yes

Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).

The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24. The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.

However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.

To run the MySQL server we ran MariaDB version 10.4.12. Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.

Search
Clear search
Close search
Google apps
Main menu