11 datasets found

Z
SQL Injection Attack Netflow
data.niaid.nih.gov
zenodo.org
Updated Sep 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adrián Campazas (2022). SQL Injection Attack Netflow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6907251
Explore at:
Dataset updated
Sep 28, 2022
Dataset provided by
Adrián Campazas
Ignacio Crespo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.

NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

Datasets

The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).

The datasets contain both benign and malicious traffic. All collected datasets are balanced.

The version of NetFlow used to build the datasets is 5.

Dataset Aim Samples Benign-malicious traffic ratio D1 Training 400,003 50% D2 Test 57,239 50%

Infrastructure and implementation

Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.

DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)

Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).

The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.

The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.

Parameters Description '--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema' Enumerate users, password hashes, privileges, roles, databases, tables and columns --level=5 Increase the probability of a false positive identification --risk=3 Increase the probability of extracting data --random-agent Select the User-Agent randomly --batch Never ask for user input, use the default behavior --answers="follow=Y" Predefined answers to yes

Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).

The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24. The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.

However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.

To run the MySQL server we ran MariaDB version 10.4.12. Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.
P
WikiSQL Dataset
paperswithcode.com
opendatalab.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WikiSQL Dataset [Dataset]. https://paperswithcode.com/dataset/wikisql
Explore at:
Authors
Victor Zhong; Caiming Xiong; Richard Socher
Description
WikiSQL consists of a corpus of 87,726 hand-annotated SQL query and natural language question pairs. These SQL queries are further split into training (61,297 examples), development (9,145 examples) and test sets (17,284 examples). It can be used for natural language inference tasks related to relational databases.
Z
Stack Overflow Chat Dutch
data.niaid.nih.gov
huggingface.co
+1more
Updated Jun 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vanroy, Bram (2023). Stack Overflow Chat Dutch [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8052379
Explore at:
Dataset updated
Jun 20, 2023
Dataset authored and provided by
Vanroy, Bram
License
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
Description
This dataset contains 56,964 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch, specifically in the domain of programming (Stack Overflow). They are translations of Baize's machine-generated answers to the Stack Overflow dataset.

Data Instances

{'id': 9, 'input': 'Het gesprek tussen de mens en de AI-assistent. ' '[|Human|] Ik heb wat hulp nodig met Spark SQL union en het ordenen ' 'van kolommen. ' '[|AI|] Natuurlijk, ik help je graag. Waar heb je specifiek hulp ' 'bij nodig? ' '[|Human|] Ik wil de output van een UNION-query ordenen op een ' 'kolom die niet aanwezig is in de SELECT-instructie. Is dat ' 'mogelijk? ' '[|AI|] Ja, het is mogelijk met Spark SQL. Je kunt de ordinale ' 'positie van de kolom in de ORDER BY-clausule gebruiken in plaats ' 'van de kolomnaam. Als je bijvoorbeeld wilt ordenen op de derde ' 'kolom in de SELECT-instructie, kun je ORDER BY 3 gebruiken. ' '[|Human|] Dat is nuttig. Kan ik ook een alias gebruiken in de ' 'ORDER BY-clausule? ' '[|AI|] Ja, je kunt een alias gebruiken in de ORDER BY-clausule. De ' 'alias moet worden gedefinieerd in de SELECT-instructie. Als je ' 'bijvoorbeeld de derde kolom als "col3" alias geeft, kun je ordenen ' 'met ORDER BY col3. ' '[|Human|] Oké, ik denk dat ik het begrijp. Kan ik dezelfde alias ' 'gebruiken voor kolommen uit beide tabellen in de UNION? ' '[|AI|] Nee, je kunt dezelfde alias niet gebruiken voor kolommen ' 'uit beide tabellen in de UNION. Elke tabel in de UNION moet een ' 'unieke alias hebben voor de kolommen. ' '[|Human|] ', 'topic': 'Spark SQL UNION - ORDER BY kolom niet in SELECT'},

Data Fields

id: the ID of the item. The following 82 IDs are not included because they could not be translated: [1713, 1937, 1960, 4326, 4356, 8357, 8542, 8827, 9137, 9782, 11560, 11961, 12244, 12362, 12488, 13259, 13621, 14445, 14835, 15006, 17746, 18808, 19285, 19426, 19491, 21270, 21661, 22098, 23352, 23840, 23869, 25148, 25928, 27102, 27856, 28387, 29942, 30041, 30251, 32396, 32742, 32941, 33628, 34116, 34648, 34859, 35977, 35987, 36035, 36456, 37028, 37238, 37640, 38107, 38735, 39015, 40984, 41115, 41567, 42397, 43219, 43783, 44599, 44980, 45239, 47676, 48922, 49534, 50282, 50683, 50804, 50919, 51076, 51211, 52000, 52183, 52489, 52595, 53884, 54726, 55795, 56992]

input: the machine-generated conversation between AI and "Human". Always starts with Het gesprek tussen de mens en de AI-assistent. and has at least one occurrence of both [|AI|] and [|Human|].

topic: the topic description

Dataset Creation

Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

The prompt template to translate the input is (where src_lang was English and tgt_lang Dutch):

CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a conversation between an AI assistant and a human from {src_lang} into {tgt_lang}.

Here are the requirements that you should adhere to: 1. maintain the format: the conversation consists of the AI (marked as [|AI|]) and the human ([|Human|]) talking in turns and responding to each other; 2. do not translate the speaker identifiers [|AI|] and [|Human|] but always copy them into the translation in appropriate places; 3. ensure accurate translation and keep the correctness of the conversation; 4. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias; 5. translate the human's text using informal, but standard, language; 6. make sure to avoid biases (such as gender bias, grammatical bias, social bias); 7. if the human asks to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in {tgt_lang}, and then also generate a corrected output version for the AI in {tgt_lang}; 8. if the human asks to translate text from one to another language, then you only translate the human's question to {tgt_lang} but you keep the translation that the AI provides in the language that the human requested; 9. do not translate code fragments but copy them as they are. If there are English examples, variable names or definitions in code fragments, keep them in English.

Now translate the following conversation with the requirements set out above. Do not provide an explanation and do not add anything else.

"""

The prompt to translate the topic is:

TOPIC_TRANSLATION_PROMPT = "Translate the following title of a conversation from {src_lang} to {tgt_lang} in a succinct,"
" summarizing manner. Translate accurately and formally. Do not provide any explanation"
" about the translation and do not include the original title.

"

The system message was:

You are a helpful assistant that translates English to Dutch to the requirements that are given to you.

Note that 82 items (0.1%) were not successfully translated. The translation was missing the AI identifier [|AI|] and/or the human one [|Human|]. The IDs for the missing items are [1713, 1937, 1960, 4326, 4356, 8357, 8542, 8827, 9137, 9782, 11560, 11961, 12244, 12362, 12488, 13259, 13621, 14445, 14835, 15006, 17746, 18808, 19285, 19426, 19491, 21270, 21661, 22098, 23352, 23840, 23869, 25148, 25928, 27102, 27856, 28387, 29942, 30041, 30251, 32396, 32742, 32941, 33628, 34116, 34648, 34859, 35977, 35987, 36035, 36456, 37028, 37238, 37640, 38107, 38735, 39015, 40984, 41115, 41567, 42397, 43219, 43783, 44599, 44980, 45239, 47676, 48922, 49534, 50282, 50683, 50804, 50919, 51076, 51211, 52000, 52183, 52489, 52595, 53884, 54726, 55795, 56992].

The translation quality has not been verified. Use at your own risk!

Licensing Information

Licensing info for Stack Overflow Questions is listed as Apache 2.0. If you use the current dataset, you should also adhere to the original license.

This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

If you use this dataset, you must also follow the Sharing and Usage policies.

As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

This dataset is also available on the Hugging Face hub with the same DOI and license. See that README for more info.
h
sql-create-context
huggingface.co
opendatalab.com
Updated Apr 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
brianm (2023). sql-create-context [Dataset]. https://huggingface.co/datasets/b-mc2/sql-create-context
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 21, 2023
Authors
brianm
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

This dataset builds from WikiSQL and Spider. There are 78,577 examples of natural language queries, SQL CREATE TABLE statements, and SQL Query answering the question using the CREATE statement as context. This dataset was built with text-to-sql LLMs in mind, intending to prevent hallucination of column and table names often seen when trained on text-to-sql datasets. The CREATE TABLE statement can often be copy and pasted from different DBMS and provides table names… See the full description on the dataset page: https://huggingface.co/datasets/b-mc2/sql-create-context.
IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification
zenodo.org
application/gzip, zip
Updated Mar 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryan Palvich; Nima Ebadi; Richard Tarbell; Billy Linares; Adrian Tan; Rachael Humphreys; Jayanta Kumar Das; Rambod Ghandiparsi; Hannah Haley; Jerris George; Rocky Slavin; Kim-Kwan Raymond Choo; Glenn Dietrich; Anthony Rios; Ryan Palvich; Nima Ebadi; Richard Tarbell; Billy Linares; Adrian Tan; Rachael Humphreys; Jayanta Kumar Das; Rambod Ghandiparsi; Hannah Haley; Jerris George; Rocky Slavin; Kim-Kwan Raymond Choo; Glenn Dietrich; Anthony Rios (2025). IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification [Dataset]. http://doi.org/10.5281/zenodo.15000588
Explore at:
zip, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15000588
Dataset updated
Mar 10, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ryan Palvich; Nima Ebadi; Richard Tarbell; Billy Linares; Adrian Tan; Rachael Humphreys; Jayanta Kumar Das; Rambod Ghandiparsi; Hannah Haley; Jerris George; Rocky Slavin; Kim-Kwan Raymond Choo; Glenn Dietrich; Anthony Rios; Ryan Palvich; Nima Ebadi; Richard Tarbell; Billy Linares; Adrian Tan; Rachael Humphreys; Jayanta Kumar Das; Rambod Ghandiparsi; Hannah Haley; Jerris George; Rocky Slavin; Kim-Kwan Raymond Choo; Glenn Dietrich; Anthony Rios
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification

Overview

This dataset accompanies the paper:

"Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats"

Published in TrustNLP: Fifth Workshop on Trustworthy Natural Language Processing, colocated with NAACL 2025.

This dataset is designed to facilitate research in:

Text-to-SQL: Generating SQL queries from natural language.

IoT Network Traffic Analysis: Detecting malicious activity in IoT environments.

Multimodal Learning: Combining structured database queries with network security classification.

Dataset Contents

The dataset consists of three main components:

1. IoT Database (iot_database.sql.gz)

SQL schema and data from IoT-23 logs and Smart Building Sensor datasets to be put into a database.

2. Text-to-SQL Data (text-to-SQL-data.zip)

Includes queries with joins, aggregations, temporal conditions, and nested clauses.

Data split into training (6,591), validation (2,197), and test (2,197) sets.

3. Network Traffic Data (network_traffic_data.zip)

Each record labeled as benign or malicious.

Features include timestamps, IPs, ports, protocols, byte counts, and connection history.

Malicious traffic includes DDoS, C&C, and botnet-related activity.

Usage Instructions

Setting Up the Database

Extract the database file:
gunzip iot_database.sql.gz

Import into MySQL:
mysql -u

Verify the schema:
SHOW TABLES;

Citation

If you use this dataset, please cite:

@inproceedings{pavlich2025beyond, author = {Ryan Pavlich and Nima Ebadi and Richard Tarbell and Billy Linares and Adrian Tan and Rachael Humphreys and Jayanta Kumar Das and Rambod Ghandiparsi and Hannah Haley and Jerris George and Rocky Slavin and Kim-Kwang Raymond Choo and Glenn Dietrich and Anthony Rios}, title = {Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats}, booktitle = {TrustNLP: Fifth Workshop on Trustworthy Natural Language Processing}, year = {2025}, organization = {NAACL} }

Contact

For questions or collaborations, contact Anthony Rios at Anthony.Rios@utsa.edu.
h
odoo-sql-query-dataset
huggingface.co
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vinay Rana (2025). odoo-sql-query-dataset [Dataset]. https://huggingface.co/datasets/VPCSinfo/odoo-sql-query-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 12, 2025
Authors
Vinay Rana
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Odoo SQL Query Dataset

This dataset contains natural language to SQL query pairs specifically for Odoo 17.0 Community Edition. It's designed to help train and fine-tune language models for generating accurate SQL queries for Odoo databases.

Dataset Description Overview

The dataset consists of 6815 carefully curated examples of natural language questions paired with their corresponding SQL queries for Odoo databases. Each example includes detailed instructions… See the full description on the dataset page: https://huggingface.co/datasets/VPCSinfo/odoo-sql-query-dataset.
f
ReCiterAnalysis.sql.
plos.figshare.com
txt
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul J. Albert; Sarbajit Dutta; Jie Lin; Zimeng Zhu; Michael Bales; Stephen B. Johnson; Mohammad Mansour; Drew Wright; Terrie R. Wheeler; Curtis L. Cole (2023). ReCiterAnalysis.sql. [Dataset]. http://doi.org/10.1371/journal.pone.0244641.s006
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0244641.s006
Dataset updated
Jun 11, 2023
Dataset provided by
PLOS ONE
Authors
Paul J. Albert; Sarbajit Dutta; Jie Lin; Zimeng Zhu; Michael Bales; Stephen B. Johnson; Mohammad Mansour; Drew Wright; Terrie R. Wheeler; Curtis L. Cole
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Database model for storing output of Feature Generator API. Includes some sample data. (SQL)
pyKinML database
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carles Martí; Carles Martí; Judit Zádor; Judit Zádor; Habib Najm; Habib Najm; Christian Devereux; Christian Devereux (2024). pyKinML database [Dataset]. http://doi.org/10.5281/zenodo.10892202
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10892202
Dataset updated
Apr 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Carles Martí; Carles Martí; Judit Zádor; Judit Zádor; Habib Najm; Habib Najm; Christian Devereux; Christian Devereux
Description
This is a set of SQL databases with information about molecules and radicals with the following database conventions and content.

Each database is named as “CnHm”, where n and m range from 1 to 5 and indicate the number of carbon (C) and hydrogen (H) atoms in the structures in the database.

Each database contains entries of a large number of “CnHm” geometries. Within each database are frous tables.

Table “meta”
Contains the name of the database and the date it was created.Table “xyz”
The columns of the “xyz” database are the following:

“id”: a numerical identification number, integer

“calc_params”: metadata describing the level of theory and other details of the quantum chemical calculations to generate the Hessian used for generating this structure, numpy array stored as a blob.

“calc”: software used to for the calculation, string

“temp”: the temperature used during the normal mode sampling process to generate the structure in K, outliers are assigned negative temperature, float

“name”: a unique name describing the anchor point the structure was generated from, string

“dist”: the normalized unitless distance of the structure from its anchor point at the temperature given in “temp”, float

“geometry”: atomic coordinates in angstroms, (n+m) by 3 numpy array of floats stored as a blob. Note that the atomic positions are listed with carbons first followed by hydrogens.

“created_at”: date the structure was generated

Table “energy”
The columns of the “energy” database are the following:

“id”: a numerical identification number, not linked to the “id” in “xyz”, integer

“fidelity”: the fidelity level the energy was calculated at, integer

0 = B3LYP/6-31+G(d)

1 = wB97X-D/6-311++G(d,p)

2 = HF/6-31G

3 = B3LYP/6-31G

4 = B2PLYPD3/6-311++G(d,p)

“E”: molecular energy in eV, float

“xyz_id”: the “id” of the geometry in the “xyz” table this energy calculated for, integer

“hessian”: empty

“forces”: atomic forces in eV/angstrom, (n+m) by 3 numpy array of floats stored as a blob. Note that the atomic forces are listed in the same order as the atoms in the “geometry” in the “xyz” table

“calc_params”: metadata describing the level of theory and other details of the energy and force calculations of this entry, numpy array stored as a blob

“calc”: software used to for the energy and force calculation

“created_at”: date the energy and forces were calculated

“sample_set_id”: empty

Table “aev”: currently empty (edited)
d
Data from: Acquisition of germ plasm accelerates vertebrate evolution.
datadryad.org
explore.openaire.eu
+2more
zip
Updated Apr 9, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Teri Evans; Christopher M. Wade; Frank A. Chapman; Andrew D. Johnson; Matthew Loose (2015). Acquisition of germ plasm accelerates vertebrate evolution. [Dataset]. http://doi.org/10.5061/dryad.rd70f
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.rd70f
Dataset updated
Apr 9, 2015
Dataset provided by
Dryad
Authors
Teri Evans; Christopher M. Wade; Frank A. Chapman; Andrew D. Johnson; Matthew Loose
Time period covered
2015
Description
Scripts for AnalysesThis file contains the instructions for generating the appropriate mySQL databases and the scripts required to process this data and recreate the analyses described in Evans et al.Scripts.tar.gzAlignment FilesThis file contains the folder system storing the alignments (.nex) and Maximum-Likelihood tree files (_tree.tre) for the sequences presented in Evans et al.

Each file is named according to the seq_number information as in the database basic_db.sql (also contained within this DataDryad submission).

Each alignment contains only the sequence from the first two codon positions as reported in Evans et. al.Alignment_files.tar.gzmySQL Database including dataThis file contains a mySQL database including all the data used in Evans et al. It is in the same format as the database which can be created using the files available in the Scripts package associated with this datadryad package.basic_db.sql
Share of GenAI startups in India 2024, by segment
statista.com
Updated Jul 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Share of GenAI startups in India 2024, by segment [Dataset]. https://www.statista.com/statistics/1481399/india-share-of-genai-startups-by-segment/
Explore at:
Dataset updated
Jul 25, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
India
Description
As of July 2024, more than one-third of India's generative artificial intelligence (GenAI) startups were in the code and data segment. It was followed by audio and video segment startups at 27 percent. The code and data GenAI provides features like generating code, and documents, as well as converting text to SQL.
Oracle: revenue by segment 2008-2024
statista.com
flwrdeptvarieties.store
Updated Jul 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Oracle: revenue by segment 2008-2024 [Dataset]. https://www.statista.com/statistics/269728/oracles-revenue-by-business-segment/
Explore at:
Dataset updated
Jul 1, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
Oracle’s cloud services and license support division is the company’s most profitable business segment, bringing in over 39 billion U.S. dollars in its 2024 fiscal year. In that year, Oracle brought in annual revenue of close to 52 billion U.S. dollars, its highest revenue figure to date. Oracle Corporation Oracle was founded by Larry Ellison in 1977 as a tech company primarily focused on relational databases. Today, Oracle ranks among the largest companies in the world in terms of market value and serves as the world’s most popular database management system provider. Oracle’s success is not only reflected in its booming sales figures, but also in its growing number of employees: between fiscal year 2008 and 2021, Oracle’s total employee number has grown substantially, increasing from around 84,000 to 132,000. Database market The global database market reached a size of 65 billion U.S. dollars in 2020. Database Management Systems (DBMSs) provide a platform through which developers can organize, update, and control large databases, with products like Oracle, MySQL, and Microsoft SQL Server being the most widely used in the market.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Adrián Campazas (2022). SQL Injection Attack Netflow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6907251

SQL Injection Attack Netflow

Explore at:

Dataset updated

Sep 28, 2022

Dataset provided by

Adrián Campazas
Ignacio Crespo

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Introduction

This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.

NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

Datasets

The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).

The datasets contain both benign and malicious traffic. All collected datasets are balanced.

The version of NetFlow used to build the datasets is 5.

    Dataset
    Aim
    Samples
    Benign-malicious
    traffic ratio




    D1
    Training
    400,003
    50%


    D2
    Test
    57,239
    50%

Infrastructure and implementation

Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.

DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)

Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).

The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.

The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.

    Parameters
    Description




    '--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema'
    Enumerate users, password hashes, privileges, roles, databases, tables and columns


    --level=5
    Increase the probability of a false positive identification


    --risk=3
    Increase the probability of extracting data


    --random-agent
    Select the User-Agent randomly


    --batch
    Never ask for user input, use the default behavior


    --answers="follow=Y"
    Predefined answers to yes

Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).

The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24. The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.

However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.

To run the MySQL server we ran MariaDB version 10.4.12. Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.

Clear search

Close search

Google apps

Main menu

SQL Injection Attack Netflow

WikiSQL Dataset

Stack Overflow Chat Dutch

sql-create-context

IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification

IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification

Overview

Dataset Contents

1. IoT Database (`iot_database.sql.gz`)

2. Text-to-SQL Data (`text-to-SQL-data.zip`)

3. Network Traffic Data (`network_traffic_data.zip`)

Usage Instructions

Setting Up the Database

Citation

Contact

odoo-sql-query-dataset

ReCiterAnalysis.sql.

pyKinML database

Data from: Acquisition of germ plasm accelerates vertebrate evolution.

Share of GenAI startups in India 2024, by segment

Oracle: revenue by segment 2008-2024

SQL Injection Attack Netflow

SQL Injection Attack Netflow

WikiSQL Dataset

Stack Overflow Chat Dutch

sql-create-context

IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification

IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification

Overview

Dataset Contents

1. IoT Database (iot_database.sql.gz)

2. Text-to-SQL Data (text-to-SQL-data.zip)

3. Network Traffic Data (network_traffic_data.zip)

Usage Instructions

Setting Up the Database

Citation

Contact

odoo-sql-query-dataset

ReCiterAnalysis.sql.

pyKinML database

Data from: Acquisition of germ plasm accelerates vertebrate evolution.

Share of GenAI startups in India 2024, by segment

Oracle: revenue by segment 2008-2024

SQL Injection Attack NetflowSee More Versions

1. IoT Database (`iot_database.sql.gz`)

2. Text-to-SQL Data (`text-to-SQL-data.zip`)

3. Network Traffic Data (`network_traffic_data.zip`)

SQL Injection Attack Netflow