67 datasets found

H
Online Shopping Store - Web Server Logs
dataverse.harvard.edu
Updated May 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farzin Zaker (2021). Online Shopping Store - Web Server Logs [Dataset]. http://doi.org/10.7910/DVN/3QBYB5
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/3QBYB5
Dataset updated
May 20, 2021
Dataset provided by
Harvard Dataverse
Authors
Farzin Zaker
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Nginx server access log for an online shopping store
LogHub - Apache Log Data
kaggle.com
zip
Updated Oct 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Om Duggineni (2023). LogHub - Apache Log Data [Dataset]. https://www.kaggle.com/datasets/omduggineni/loghub-apache-log-data
Explore at:
zip(254455 bytes)Available download formats
Dataset updated
Oct 13, 2023
Authors
Om Duggineni
Description
Dataset

This dataset was created by Om Duggineni

Contents
Server Logs
kaggle.com
zip
Updated Oct 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vishnu U (2021). Server Logs [Dataset]. https://www.kaggle.com/datasets/vishnu0399/server-logs/code
Explore at:
zip(20565749 bytes)Available download formats
Dataset updated
Oct 12, 2021
Authors
Vishnu U
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The dataset is a synthetically generated server log based on Apache Server Logging Format. Each line corresponds to each log entry. The log entry has the following parameters :

Components in Log Entry :

IP of client: This refers to the IP address of the client that sent the request to the server.

Remote Log Name: Remote name of the User performing the request. In the majority of the applications, this is confidential information and is hidden or not available.

User ID: The ID of the user performing the request. In the majority of the applications, this is a piece of confidential information and is hidden or not available.

Date and Time in UTC format: The date and time of the request are represented in UTC format as follows: - Day/Month/Year:Hour:Minutes: Seconds +Time-Zone-Correction.

Request Type: The type of request (GET, POST, PUT, DELETE) that the server got. This depends on the operation that the request will do.

API: The API of the website to which the request is related. Example: When a user accesses a carton shopping website, the API comes as /usr/cart.

Protocol and Version: Protocol used for connecting with server and its version.

Status Code: Status code that the server returned for the request. Eg: 404 is sent when a requested resource is not found. 200 is sent when the request was successfully served.

Byte: The amount of data in bytes that was sent back to the client.

Referrer: The websites/source from where the user was directed to the current website. If none it is represented by “-“.

UA String: The user agent string contains details of the browser and the host device (like the name, version, device type etc.).

Response Time: The response time the server took to serve the request. This is the difference between the timestamps when the request was received and when the request was served.

Content

The dataset consists of two files - - logfiles.log is the actual log file in text format - TestFileGenerator.py is the synthetic log file generator. The number of log entries required can be edited in the code.
m
Data from: Pillar 3: Pre-processed web server log file dataset of the...
data.mendeley.com
Updated Dec 6, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michal Munk (2021). Pillar 3: Pre-processed web server log file dataset of the banking institution [Dataset]. http://doi.org/10.17632/5bvkm76sdc.1
Explore at:
Unique identifier
https://doi.org/10.17632/5bvkm76sdc.1
Dataset updated
Dec 6, 2021
Authors
Michal Munk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset represents the pre-processed web server log file of the commercial bank. The source of data is the web server of the bank and keeps access of web users starting the year 2009 till 2012. It contains accesses to the bank website during and after the financial crisis. Unnecessary data saved by the web server was removed to keep the focus only on the textual content of the website. Many variables were added to the original log file to make the analysis workable. To keep the privacy of website users, sensitive information in the log file were anonymized. The dataset offers the way to understand the behaviour of stakeholders during and after the crisis and how they comply with the Basel regulations.
Kyoushi Log Data Set
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Landauer; Maximilian Frank; Florian Skopik; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Maximilian Frank; Florian Skopik; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber (2025). Kyoushi Log Data Set [Dataset]. http://doi.org/10.5281/zenodo.5779411
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5779411
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Max Landauer; Maximilian Frank; Florian Skopik; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Maximilian Frank; Florian Skopik; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This repository contains synthetic log data suitable for evaluation of intrusion detection systems. The logs were collected from a testbed that was built at the Austrian Institute of Technology (AIT) following the approaches by [1], [2], and [3]. Please refer to these papers for more detailed information on the dataset and cite them if the data is used for academic publications. Other than the related AIT-LDSv1.1, this dataset involves a more complex network structure, makes use of a different attack scenario, and collects log data from multiple hosts in the network. In brief, the testbed simulates a small enterprise network including mail server, file share, WordPress server, VPN, firewall, etc. Normal user behavior is simulated to generate background noise. After some days, two attack scenarios are launched against the network. Note that the AIT-LDSv2.0 extends this dataset with additional attack cases and variations of attack parameters.

The archives have the following structure. The gather directory contains the raw log data from each host in the network, as well as their system configurations. The labels directory contains the ground truth for those log files that are labeled. The processing directory contains configurations for the labeling procedure and the rules directory contains the labeling rules. Labeling of events that are related to the attacks is carried out with the Kyoushi Labeling Framework.

Each dataset contains traces of a specific attack scenario:

Scenario 1 (see gather/attacker_0/logs/sm.log for detailed attack log):

nmap scan

WPScan

dirb scan

webshell upload through wpDiscuz exploit (CVE-2020-24186)

privilege escalation

Scenario 2 (see gather/attacker_0/logs/dnsteal.log for detailed attack log):

DNSteal data exfiltration

The log data collected from the servers includes

Apache access and error logs (labeled)

audit logs (labeled)

auth logs (labeled)

VPN logs (labeled)

DNS logs (labeled)

syslog

suricata logs

exim logs

horde logs

mail logs

Note that only log files from affected servers are labeled. Label files and the directories in which they are located have the same name as their corresponding log file in the gather directory. Labels are in JSON format and comprise the following attributes: line (number of line in corresponding log file), labels (list of labels assigned to that log line), rules (names of labeling rules matching that log line). Note that not all attack traces are labeled in all log files; please refer to the labeling rules in case that some labels are not clear.

Acknowledgements: Partially funded by the FFG projects INDICAETING (868306) and DECEPT (873980), and the EU project GUARD (833456).

If you use the dataset, please cite the following publications:

[1] M. Landauer, F. Skopik, M. Wurzenberger, W. Hotwagner and A. Rauber, "Have it Your Way: Generating Customized Log Datasets With a Model-Driven Simulation Testbed," in IEEE Transactions on Reliability, vol. 70, no. 1, pp. 402-415, March 2021, doi: 10.1109/TR.2020.3031317.

[2] M. Landauer, M. Frank, F. Skopik, W. Hotwagner, M. Wurzenberger, and A. Rauber, "A Framework for Automatic Labeling of Log Datasets from Model-driven Testbeds for HIDS Evaluation". ACM Workshop on Secure and Trustworthy Cyber-Physical Systems (ACM SaT-CPS 2022), April 27, 2022, Baltimore, MD, USA. ACM.

[3] M. Frank, "Quality improvement of labels for model-driven benchmark data generation for intrusion detection systems", Master's Thesis, Vienna University of Technology, 2021.
Server Log For Attack Detection
kaggle.com
zip
Updated Dec 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IlifyDev (2024). Server Log For Attack Detection [Dataset]. https://www.kaggle.com/datasets/ilifydev/sever-log-for-attacks-detection
Explore at:
zip(36460 bytes)Available download formats
Dataset updated
Dec 18, 2024
Authors
IlifyDev
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by IlifyDev

Released under MIT

Contents
AIT Log Data Set V1.1
zenodo.org
zip
Updated Oct 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Landauer; Florian Skopik; Markus Wurzenberger; Wolfgang Hotwagner; Andreas Rauber; Max Landauer; Florian Skopik; Markus Wurzenberger; Wolfgang Hotwagner; Andreas Rauber (2023). AIT Log Data Set V1.1 [Dataset]. http://doi.org/10.5281/zenodo.4264796
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4264796
Dataset updated
Oct 18, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Max Landauer; Florian Skopik; Markus Wurzenberger; Wolfgang Hotwagner; Andreas Rauber; Max Landauer; Florian Skopik; Markus Wurzenberger; Wolfgang Hotwagner; Andreas Rauber
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
AIT Log Data Sets

This repository contains synthetic log data suitable for evaluation of intrusion detection systems. The logs were collected from four independent testbeds that were built at the Austrian Institute of Technology (AIT) following the approach by Landauer et al. (2020) [1]. Please refer to the paper for more detailed information on automatic testbed generation and cite it if the data is used for academic publications. In brief, each testbed simulates user accesses to a webserver that runs Horde Webmail and OkayCMS. The duration of the simulation is six days. On the fifth day (2020-03-04) two attacks are launched against each web server.

The archive AIT-LDS-v1_0.zip contains the directories "data" and "labels".

The data directory is structured as follows. Each directory mail.

Setup details of the web servers:

OS: Debian Stretch 9.11.6

Services:

Apache2

PHP7

Exim 4.89

Horde 5.2.22

OkayCMS 2.3.4

Suricata

ClamAV

MariaDB

Setup details of user machines:

OS: Ubuntu Bionic

Services:

Chromium

Firefox

User host machines are assigned to web servers in the following way:

mail.cup.com is accessed by users from host machines user-{0, 1, 2, 6}

mail.spiral.com is accessed by users from host machines user-{3, 5, 8}

mail.insect.com is accessed by users from host machines user-{4, 9}

mail.onion.com is accessed by users from host machines user-{7, 10}

The following attacks are launched against the web servers (different starting times for each web server, please check the labels for exact attack times):

Attack 1: multi-step attack with sequential execution of the following attacks:

nmap scan

nikto scan

smtp-user-enum tool for account enumeration

hydra brute force login

webshell upload through Horde exploit (CVE-2019-9858)

privilege escalation through Exim exploit (CVE-2019-10149)

Attack 2: webshell injection through malicious cookie (CVE-2019-16885)

Attacks are launched from the following user host machines. In each of the corresponding directories user-

user-6 attacks mail.cup.com

user-5 attacks mail.spiral.com

user-4 attacks mail.insect.com

user-7 attacks mail.onion.com

The log data collected from the web servers includes

Apache access and error logs

syscall logs collected with the Linux audit daemon

suricata logs

exim logs

auth logs

daemon logs

mail logs

syslogs

user logs

Note that due to their large size, the audit/audit.log files of each server were compressed in a .zip-archive. In case that these logs are needed for analysis, they must first be unzipped.

Labels are organized in the same directory structure as logs. Each file contains two labels for each log line separated by a comma, the first one based on the occurrence time, the second one based on similarity and ordering. Note that this does not guarantee correct labeling for all lines and that no manual corrections were conducted.

Version history and related data sets:

AIT-LDS-v1.0: Four datasets, logs from single host, fine-granular audit logs, mail/CMS.

AIT-LDS-v1.1: Removed carriage return of line endings in audit.log files.

AIT-LDS-v2.0: Eight datasets, logs from all hosts, system logs and network traffic, mail/CMS/cloud/web.

Acknowledgements: Partially funded by the FFG projects INDICAETING (868306) and DECEPT (873980), and the EU project GUARD (833456).

If you use the dataset, please cite the following publication:

[1] M. Landauer, F. Skopik, M. Wurzenberger, W. Hotwagner and A. Rauber, "Have it Your Way: Generating Customized Log Datasets With a Model-Driven Simulation Testbed," in IEEE Transactions on Reliability, vol. 70, no. 1, pp. 402-415, March 2021, doi: 10.1109/TR.2020.3031317. [PDF]
Z
Web robot detection - Server logs
data.niaid.nih.gov
Updated Jan 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lagopoulos, Athanasios; Tsoumakas, Grigorios (2021). Web robot detection - Server logs [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3477931
Explore at:
Dataset updated
Jan 4, 2021
Dataset provided by
Aristotle University of Thessaloniki
Authors
Lagopoulos, Athanasios; Tsoumakas, Grigorios
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains server logs from the search engine of the library and information center of the Aristotle University of Thessaloniki in Greece (http://search.lib.auth.gr/). The search engine enables users to check the availability of books and other written works, and search for digitized material and scientific publications. The server logs obtained span an entire month, from March 1st to March 31 2018 and consist of 4,091,155 requests with an average of 131,973 requests per day and a standard deviation of 36,996.7 requests. In total, there are requests from 27,061 unique IP addresses and 3,441 unique user-agent strings. The server logs are in JSON format and they are anonymized by masking the last 6 digits of the IP address and by hashing the last part of the URLs requested (after last /). The dataset also contains the processed form of the server logs as a labelled dataset of log entries grouped into sessions along with their extracted features (simple semantic features). We make this dataset publicly available, the first one in this domain, in order to provide a common ground for testing web robot detection methods, as well as other methods that analyze server logs.
WordPress DDos Log Dataset
kaggle.com
zip
Updated Apr 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ajiboye Toluwalase (2023). WordPress DDos Log Dataset [Dataset]. https://www.kaggle.com/datasets/ajiboyetoluwalase/wordpress-ddos-log-dataset
Explore at:
zip(21215 bytes)Available download formats
Dataset updated
Apr 26, 2023
Authors
Ajiboye Toluwalase
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context In a distributed denial-of-service (DDoS) attack, multiple compromised computer systems attack a target and cause a denial of service for users of the targeted resource. The target can be a server, website or other network resource. The flood of incoming messages, connection requests or malformed packets to the target system forces it to slow down or even crash and shut down, thereby denying service to legitimate users or systems. - The dataset contains information about a DDos attack on server. It's taken from a server's log - I wanted to create this dataset so that all the data scientists out there could try to tackle this dataset and maybe even find a way of highlighting or identifying attacks beforehand.

Please Note The attacker is using the port [500,913]

About This Dataset contains 568 rows of request to the server before and during the DDos attack

IP adress - this is a unique string of characters that identifies each computer using the Internet Protocol to communicate over a network.

Port - A port is a virtual point where network connections start and end.

WordPress Versions - the version of WordPress used for the request .

Website - This is the website used for the request .

DT - This is the Day,Month and Year of the request.

Time - This is the Hour,Minute and Second of the request.

Aim of this dataset - The goal is mainly visualization and feature engineering to highlight or identify an attack or the attacker - We can create Data Dashboards with Tableau or Power BI - We can also create models that can identify an attack
Sample Apache Log file
kaggle.com
zip
Updated Nov 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fatih python (2025). Sample Apache Log file [Dataset]. https://www.kaggle.com/datasets/fatihpython/sample-apache-log-file
Explore at:
zip(9559 bytes)Available download formats
Dataset updated
Nov 8, 2025
Authors
fatih python
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains a sample of real-world Apache HTTP server log entries. The logs capture events from December 2005, showcasing server initialization messages, worker environment setup, and mod_jk error states. These logs can be useful for exploring and understanding typical web server activity, error handling, and worker thread management in Apache environments.
AIT Log Data Set V2.0
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Jun 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber (2024). AIT Log Data Set V2.0 [Dataset]. http://doi.org/10.5281/zenodo.5789064
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5789064
Dataset updated
Jun 28, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
AIT Log Data Sets

This repository contains synthetic log data suitable for evaluation of intrusion detection systems, federated learning, and alert aggregation. A detailed description of the dataset is available in [1]. The logs were collected from eight testbeds that were built at the Austrian Institute of Technology (AIT) following the approach by [2]. Please cite these papers if the data is used for academic publications.

In brief, each of the datasets corresponds to a testbed representing a small enterprise network including mail server, file share, WordPress server, VPN, firewall, etc. Normal user behavior is simulated to generate background noise over a time span of 4-6 days. At some point, a sequence of attack steps is launched against the network. Log data is collected from all hosts and includes Apache access and error logs, authentication logs, DNS logs, VPN logs, audit logs, Suricata logs, network traffic packet captures, horde logs, exim logs, syslog, and system monitoring logs. Separate ground truth files are used to label events that are related to the attacks. Compared to the AIT-LDSv1.1, a more complex network and diverse user behavior is simulated, and logs are collected from all hosts in the network. If you are only interested in network traffic analysis, we also provide the AIT-NDS containing the labeled netflows of the testbed networks. We also provide the AIT-ADS, an alert data set derived by forensically applying open-source intrusion detection systems on the log data.

The datasets in this repository have the following structure:

The gather directory contains all logs collected from the testbed. Logs collected from each host are located in gather/.

The labels directory contains the ground truth of the dataset that indicates which events are related to attacks. The directory mirrors the structure of the gather directory so that each label files is located at the same path and has the same name as the corresponding log file. Each line in the label files references the log event corresponding to an attack by the line number counted from the beginning of the file ("line"), the labels assigned to the line that state the respective attack step ("labels"), and the labeling rules that assigned the labels ("rules"). An example is provided below.

The processing directory contains the source code that was used to generate the labels.

The rules directory contains the labeling rules.

The environment directory contains the source code that was used to deploy the testbed and run the simulation using the Kyoushi Testbed Environment.

The dataset.yml file specifies the start and end time of the simulation.

The following table summarizes relevant properties of the datasets:

fox

Simulation time: 2022-01-15 00:00 - 2022-01-20 00:00

Attack time: 2022-01-18 11:59 - 2022-01-18 13:15

Scan volume: High

Unpacked size: 26 GB

harrison

Simulation time: 2022-02-04 00:00 - 2022-02-09 00:00

Attack time: 2022-02-08 07:07 - 2022-02-08 08:38

Scan volume: High

Unpacked size: 27 GB

russellmitchell

Simulation time: 2022-01-21 00:00 - 2022-01-25 00:00

Attack time: 2022-01-24 03:01 - 2022-01-24 04:39

Scan volume: Low

Unpacked size: 14 GB

santos

Simulation time: 2022-01-14 00:00 - 2022-01-18 00:00

Attack time: 2022-01-17 11:15 - 2022-01-17 11:59

Scan volume: Low

Unpacked size: 17 GB

shaw

Simulation time: 2022-01-25 00:00 - 2022-01-31 00:00

Attack time: 2022-01-29 14:37 - 2022-01-29 15:21

Scan volume: Low

Data exfiltration is not visible in DNS logs

Unpacked size: 27 GB

wardbeck

Simulation time: 2022-01-19 00:00 - 2022-01-24 00:00

Attack time: 2022-01-23 12:10 - 2022-01-23 12:56

Scan volume: Low

Unpacked size: 26 GB

wheeler

Simulation time: 2022-01-26 00:00 - 2022-01-31 00:00

Attack time: 2022-01-30 07:35 - 2022-01-30 17:53

Scan volume: High

No password cracking in attack chain

Unpacked size: 30 GB

wilson

Simulation time: 2022-02-03 00:00 - 2022-02-09 00:00

Attack time: 2022-02-07 10:57 - 2022-02-07 11:49

Scan volume: High

Unpacked size: 39 GB

The following attacks are launched in the network:

Scans (nmap, WPScan, dirb)

Webshell upload (CVE-2020-24186)

Password cracking (John the Ripper)

Privilege escalation

Remote command execution

Data exfiltration (DNSteal)

Note that attack parameters and their execution orders vary in each dataset. Labeled log files are trimmed to the simulation time to ensure that their labels (which reference the related event by the line number in the file) are not misleading. Other log files, however, also contain log events generated before or after the simulation time and may therefore be affected by testbed setup or data collection. It is therefore recommended to only consider logs with timestamps within the simulation time for analysis.

The structure of labels is explained using the audit logs from the intranet server in the russellmitchell data set as an example in the following. The first four labels in the labels/intranet_server/logs/audit/audit.log file are as follows:

{"line": 1860, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

{"line": 1861, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

{"line": 1862, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

{"line": 1863, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

Each JSON object in this file assigns a label to one specific log line in the corresponding log file located at gather/intranet_server/logs/audit/audit.log. The field "line" in the JSON objects specify the line number of the respective event in the original log file, while the field "labels" comprise the corresponding labels. For example, the lines in the sample above provide the information that lines 1860-1863 in the gather/intranet_server/logs/audit/audit.log file are labeled with "attacker_change_user" and "escalate" corresponding to the attack step where the attacker receives escalated privileges. Inspecting these lines shows that they indeed correspond to the user authenticating as root:

type=USER_AUTH msg=audit(1642999060.603:2226): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:authentication acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

type=USER_ACCT msg=audit(1642999060.603:2227): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:accounting acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

type=CRED_ACQ msg=audit(1642999060.615:2228): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:setcred acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

type=USER_START msg=audit(1642999060.627:2229): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:session_open acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

The same applies to all other labels for this log file and all other log files. There are no labels for logs generated by "normal" (i.e., non-attack) behavior; instead, all log events that have no corresponding JSON object in one of the files from the labels directory, such as the lines 1-1859 in the example above, can be considered to be labeled as "normal". This means that in order to figure out the labels for the log data it is necessary to store the line numbers when processing the original logs from the gather directory and see if these line numbers also appear in the corresponding file in the labels directory.

Beside the attack labels, a general overview of the exact times when specific attack steps are launched are available in gather/attacker_0/logs/attacks.log. An enumeration of all hosts and their IP addresses is stated in processing/config/servers.yml. Moreover, configurations of each host are provided in gather/ and gather/.

Version history:

AIT-LDS-v1.x: Four datasets, logs from single host, fine-granular audit logs, mail/CMS.

AIT-LDS-v2.0: Eight datasets, logs from all hosts, system logs and network traffic, mail/CMS/cloud/web.

Acknowledgements: Partially funded by the FFG projects INDICAETING (868306) and DECEPT (873980), and the EU projects GUARD (833456) and PANDORA (SI2.835928).

If you use the dataset, please cite the following publications:

[1] M. Landauer, F. Skopik, M. Frank, W. Hotwagner,
Z
Comprehensive Network Logs Dataset for Multi-Device Analysis
data.niaid.nih.gov
Updated Jan 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salman, Mahmood; Hasan, Raza (2024). Comprehensive Network Logs Dataset for Multi-Device Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10492769
Explore at:
Dataset updated
Jan 11, 2024
Dataset provided by
Malaysia University of Science and Technology
Southampton Solent University
Authors
Salman, Mahmood; Hasan, Raza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset comprises diverse logs from various sources, including cloud services, routers, switches, virtualization, network security appliances, authentication systems, DNS, operating systems, packet captures, proxy servers, servers, syslog data, and network data. The logs encompass a wide range of information such as traffic details, user activities, authentication events, DNS queries, network flows, security actions, and system events. By analyzing these logs collectively, users can gain insights into network patterns, anomalies, user authentication, cloud service usage, DNS traffic, network flows, security incidents, and system activities. The dataset is invaluable for network monitoring, performance analysis, anomaly detection, security investigations, and correlating events across the entire network infrastructure.
G
Online Gaming Server Latency Logs
gomask.ai
csv, json
Updated Aug 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GoMask.ai (2025). Online Gaming Server Latency Logs [Dataset]. https://gomask.ai/marketplace/datasets/online-gaming-server-latency-logs
Explore at:
csv(10 MB), jsonAvailable download formats
Dataset updated
Aug 21, 2025
Dataset provided by
GoMask.ai
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2024 - 2025
Area covered
Global
Variables measured
log_id, game_id, isp_name, game_name, jitter_ms, player_id, player_ip, server_id, server_ip, timestamp, and 6 more
Description
This dataset provides detailed, time-stamped latency measurements for multiplayer gaming servers, including player and server identifiers, geographic regions, network conditions, and connection types. It is ideal for infrastructure monitoring, network optimization, and player experience analysis across global gaming platforms.
test-mcp-logs
huggingface.co
Updated Aug 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face MCP Server (2025). test-mcp-logs [Dataset]. https://huggingface.co/datasets/hf-mcp-server/test-mcp-logs
Explore at:
Dataset updated
Aug 2, 2025
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face MCP Server
Description
(Put queries first as heuristics don't detect when there are no logs)
NASA Website Data
kaggle.com
zip
Updated May 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdeldjaouad Nusayr Medakene (2022). NASA Website Data [Dataset]. https://www.kaggle.com/datasets/djaouadnm/nasa-website-data/discussion
Explore at:
zip(16879368 bytes)Available download formats
Dataset updated
May 27, 2022
Authors
Abdeldjaouad Nusayr Medakene
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains a single month of server logs from the NASA.gov website. The data is all from August 1995. This dataset was put together as supporting data for an article on exploring data with SQL. Check it out

Source Data on NASA.gov

NOTE: the data appears to be missing data from August 2nd.
h
malicious_logs
huggingface.co
Updated Mar 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
u-haru (2020). malicious_logs [Dataset]. https://huggingface.co/datasets/u-haru/malicious_logs
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 26, 2020
Authors
u-haru
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Malicious Logs

These are malicious logs collected from my Nginx server. Isoration forest is used to collect these logs. Model: u-haru/log-inspectorCode: github.com/u-haru/log-inspector
Server_logs_dataset
kaggle.com
zip
Updated Jul 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Azmat (2020). Server_logs_dataset [Dataset]. https://www.kaggle.com/azmatsiddique/server-logs-dataset
Explore at:
zip(12997 bytes)Available download formats
Dataset updated
Jul 1, 2020
Authors
Azmat
Description
Dataset

This dataset was created by Azmat

Contents
d
Korea Employment Information Service_Number of visits and page views (based...
data.go.kr
csv
Updated Aug 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Korea Employment Information Service_Number of visits and page views (based on server log) of WorkNet middle-aged service [Dataset]. https://www.data.go.kr/en/data/15114512/fileData.do
Explore at:
csvAvailable download formats
Dataset updated
Aug 1, 2025
License
https://data.go.kr/ugs/selectPortalPolicyView.dohttps://data.go.kr/ugs/selectPortalPolicyView.do
Description
This data is based on web server logs and analyzes Worknet service usage by middle-aged and older adults from 2021 to 2022. It is structured to understand the service access and usage patterns of middle-aged and older job seekers and career changers within Worknet. This data is aggregated primarily by the number of visits and page views for service pages dedicated to middle-aged and older adults, which are automatically recorded values based on user browser-based server access history. It allows for analysis of trends over time, concentration by day or time, and frequency of use by content. This data serves as the basis for identifying demand for job support content among middle-aged and older adults, analyzing the promotional effectiveness of policies, and improving digital service accessibility. In particular, analyzing page views by core categories within Worknet, such as middle-aged and older job services, education and training, and career counseling pages, would enable more sophisticated demand-tailored policy design.
Simulated Banking Server Data for PM
kaggle.com
zip
Updated Oct 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RadioactiveM (2024). Simulated Banking Server Data for PM [Dataset]. https://www.kaggle.com/datasets/radioactivem/server-logs-data-for-server-failure
Explore at:
zip(980559 bytes)Available download formats
Dataset updated
Oct 27, 2024
Authors
RadioactiveM
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
This dataset is a simulated representation of banking server log data. It is specifically designed for predictive maintenance use cases, with a focus on identifying banking server downtime based on various system performance metrics. The data mimics real-world conditions and has been adjusted to reflect the unique demands of banking applications, including managing high transaction volumes and ensuring system reliability.

The dataset can be used to train machine learning models for classification tasks that predict server downtime. It includes multiple server performance indicators, such as CPU usage, memory usage, disk I/O, network latency, and error rate.
web log dataset
kaggle.com
zip
Updated Dec 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ashadullah Shawon (2019). web log dataset [Dataset]. https://www.kaggle.com/shawon10/web-log-dataset
Explore at:
zip(82459 bytes)Available download formats
Dataset updated
Dec 18, 2019
Authors
Ashadullah Shawon
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

I have developed an online judge for my university named RUET OJ . I am sharing the server log dataset of RUET OJ

Content

This dataset has 16008 rows and 4 columns. Columns are IP, Time, URL, Response Status.

Acknowledgements

This dataset is too small for research . But I hope others people will also share larger dataset for web log as web log dataset is rare here .

Inspiration

This dataset will inspire other people to share their collected web log dataset .

Facebook

Twitter

Click to copy link

Link copied

Cite

Farzin Zaker (2021). Online Shopping Store - Web Server Logs [Dataset]. http://doi.org/10.7910/DVN/3QBYB5

Online Shopping Store - Web Server Logs

Explore at:

15 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.7910/DVN/3QBYB5

Dataset updated

May 20, 2021

Dataset provided by

Harvard Dataverse

Authors

Farzin Zaker

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Nginx server access log for an online shopping store

Clear search

Close search

Google apps

Main menu

Online Shopping Store - Web Server Logs

LogHub - Apache Log Data

Dataset

Contents

Server Logs

Context

Components in Log Entry :

Content

Data from: Pillar 3: Pre-processed web server log file dataset of the...

Kyoushi Log Data Set

Server Log For Attack Detection

Dataset

Contents

AIT Log Data Set V1.1

Web robot detection - Server logs

WordPress DDos Log Dataset

Sample Apache Log file

AIT Log Data Set V2.0

Comprehensive Network Logs Dataset for Multi-Device Analysis

Online Gaming Server Latency Logs

test-mcp-logs

NASA Website Data

malicious_logs

Server_logs_dataset

Dataset

Contents

Korea Employment Information Service_Number of visits and page views (based...

Simulated Banking Server Data for PM

web log dataset

Context

Content

Acknowledgements

Inspiration

Online Shopping Store - Web Server Logs