96 datasets found
  1. EDGAR Log File Data Sets

    • catalog.data.gov
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EDGAR Business Office (2025). EDGAR Log File Data Sets [Dataset]. https://catalog.data.gov/dataset/edgar-log-file-data-set
    Explore at:
    Dataset updated
    Jul 16, 2025
    Dataset provided by
    Electronic Data Gathering, Analysis, and Retrievalhttp://www.sec.gov/edgar.shtml
    Description

    The data sets provide information on internet search traffic for EDGAR filings through SEC.gov.

  2. i

    ProcMon log file

    • ieee-dataport.org
    Updated May 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huyen Nguyen (2020). ProcMon log file [Dataset]. https://ieee-dataport.org/documents/procmon-log-file
    Explore at:
    Dataset updated
    May 3, 2020
    Authors
    Huyen Nguyen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    including running malware WannaPeace and Infostealer.Dexter.

  3. Kyoushi Log Data Set

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Landauer; Maximilian Frank; Florian Skopik; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Maximilian Frank; Florian Skopik; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber (2025). Kyoushi Log Data Set [Dataset]. http://doi.org/10.5281/zenodo.5779411
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Max Landauer; Maximilian Frank; Florian Skopik; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Maximilian Frank; Florian Skopik; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This repository contains synthetic log data suitable for evaluation of intrusion detection systems. The logs were collected from a testbed that was built at the Austrian Institute of Technology (AIT) following the approaches by [1], [2], and [3]. Please refer to these papers for more detailed information on the dataset and cite them if the data is used for academic publications. Other than the related AIT-LDSv1.1, this dataset involves a more complex network structure, makes use of a different attack scenario, and collects log data from multiple hosts in the network. In brief, the testbed simulates a small enterprise network including mail server, file share, WordPress server, VPN, firewall, etc. Normal user behavior is simulated to generate background noise. After some days, two attack scenarios are launched against the network. Note that the AIT-LDSv2.0 extends this dataset with additional attack cases and variations of attack parameters.

    The archives have the following structure. The gather directory contains the raw log data from each host in the network, as well as their system configurations. The labels directory contains the ground truth for those log files that are labeled. The processing directory contains configurations for the labeling procedure and the rules directory contains the labeling rules. Labeling of events that are related to the attacks is carried out with the Kyoushi Labeling Framework.

    Each dataset contains traces of a specific attack scenario:

    • Scenario 1 (see gather/attacker_0/logs/sm.log for detailed attack log):
      • nmap scan
      • WPScan
      • dirb scan
      • webshell upload through wpDiscuz exploit (CVE-2020-24186)
      • privilege escalation
    • Scenario 2 (see gather/attacker_0/logs/dnsteal.log for detailed attack log):
      • DNSteal data exfiltration

    The log data collected from the servers includes

    • Apache access and error logs (labeled)
    • audit logs (labeled)
    • auth logs (labeled)
    • VPN logs (labeled)
    • DNS logs (labeled)
    • syslog
    • suricata logs
    • exim logs
    • horde logs
    • mail logs

    Note that only log files from affected servers are labeled. Label files and the directories in which they are located have the same name as their corresponding log file in the gather directory. Labels are in JSON format and comprise the following attributes: line (number of line in corresponding log file), labels (list of labels assigned to that log line), rules (names of labeling rules matching that log line). Note that not all attack traces are labeled in all log files; please refer to the labeling rules in case that some labels are not clear.

    Acknowledgements: Partially funded by the FFG projects INDICAETING (868306) and DECEPT (873980), and the EU project GUARD (833456).

    If you use the dataset, please cite the following publications:

    [1] M. Landauer, F. Skopik, M. Wurzenberger, W. Hotwagner and A. Rauber, "Have it Your Way: Generating Customized Log Datasets With a Model-Driven Simulation Testbed," in IEEE Transactions on Reliability, vol. 70, no. 1, pp. 402-415, March 2021, doi: 10.1109/TR.2020.3031317.

    [2] M. Landauer, M. Frank, F. Skopik, W. Hotwagner, M. Wurzenberger, and A. Rauber, "A Framework for Automatic Labeling of Log Datasets from Model-driven Testbeds for HIDS Evaluation". ACM Workshop on Secure and Trustworthy Cyber-Physical Systems (ACM SaT-CPS 2022), April 27, 2022, Baltimore, MD, USA. ACM.

    [3] M. Frank, "Quality improvement of labels for model-driven benchmark data generation for intrusion detection systems", Master's Thesis, Vienna University of Technology, 2021.

  4. f

    OpenStack log files

    • figshare.com
    zip
    Updated Nov 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mbasa MOLO; Victor Akande; Nzanzu Vingi Patrick; Joke Badejo; Emmanuel Adetiba (2021). OpenStack log files [Dataset]. http://doi.org/10.6084/m9.figshare.17025353.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 16, 2021
    Dataset provided by
    figshare
    Authors
    Mbasa MOLO; Victor Akande; Nzanzu Vingi Patrick; Joke Badejo; Emmanuel Adetiba
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains log information of a cloud computing infrastructure based on OpenStack.Three different files are available, including the nova, cinder, and glance log files. Due to the fact that the data is unbalanced, a CSV file containing log information of the three OpenStack applications is provided. This can be used for testing in case the log files are used for a machine learning purpose. These data were collected from the Federated Genominc (FEDGEN) cloud computing infrastructure hosted in Covenant Unversity under the Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE) project funded by the World Bank.

  5. Z

    AIT Log Data Set V1.1

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Oct 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hotwagner, Wolfgang (2023). AIT Log Data Set V1.1 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3723082
    Explore at:
    Dataset updated
    Oct 18, 2023
    Dataset provided by
    Skopik, Florian
    Wurzenberger, Markus
    Rauber, Andreas
    Landauer, Max
    Hotwagner, Wolfgang
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    AIT Log Data Sets

    This repository contains synthetic log data suitable for evaluation of intrusion detection systems. The logs were collected from four independent testbeds that were built at the Austrian Institute of Technology (AIT) following the approach by Landauer et al. (2020) [1]. Please refer to the paper for more detailed information on automatic testbed generation and cite it if the data is used for academic publications. In brief, each testbed simulates user accesses to a webserver that runs Horde Webmail and OkayCMS. The duration of the simulation is six days. On the fifth day (2020-03-04) two attacks are launched against each web server.

    The archive AIT-LDS-v1_0.zip contains the directories "data" and "labels".

    The data directory is structured as follows. Each directory mail..com contains the logs of one web server. Each directory user- contains the logs of one user host machine, where one or more users are simulated. Each file log.log in the user- directories contains the activity logs of one particular user.

    Setup details of the web servers:

    OS: Debian Stretch 9.11.6

    Services:

    Apache2

    PHP7

    Exim 4.89

    Horde 5.2.22

    OkayCMS 2.3.4

    Suricata

    ClamAV

    MariaDB

    Setup details of user machines:

    OS: Ubuntu Bionic

    Services:

    Chromium

    Firefox

    User host machines are assigned to web servers in the following way:

    mail.cup.com is accessed by users from host machines user-{0, 1, 2, 6}

    mail.spiral.com is accessed by users from host machines user-{3, 5, 8}

    mail.insect.com is accessed by users from host machines user-{4, 9}

    mail.onion.com is accessed by users from host machines user-{7, 10}

    The following attacks are launched against the web servers (different starting times for each web server, please check the labels for exact attack times):

    Attack 1: multi-step attack with sequential execution of the following attacks:

    nmap scan

    nikto scan

    smtp-user-enum tool for account enumeration

    hydra brute force login

    webshell upload through Horde exploit (CVE-2019-9858)

    privilege escalation through Exim exploit (CVE-2019-10149)

    Attack 2: webshell injection through malicious cookie (CVE-2019-16885)

    Attacks are launched from the following user host machines. In each of the corresponding directories user-, logs of the attack execution are found in the file attackLog.txt:

    user-6 attacks mail.cup.com

    user-5 attacks mail.spiral.com

    user-4 attacks mail.insect.com

    user-7 attacks mail.onion.com

    The log data collected from the web servers includes

    Apache access and error logs

    syscall logs collected with the Linux audit daemon

    suricata logs

    exim logs

    auth logs

    daemon logs

    mail logs

    syslogs

    user logs

    Note that due to their large size, the audit/audit.log files of each server were compressed in a .zip-archive. In case that these logs are needed for analysis, they must first be unzipped.

    Labels are organized in the same directory structure as logs. Each file contains two labels for each log line separated by a comma, the first one based on the occurrence time, the second one based on similarity and ordering. Note that this does not guarantee correct labeling for all lines and that no manual corrections were conducted.

    Version history and related data sets:

    AIT-LDS-v1.0: Four datasets, logs from single host, fine-granular audit logs, mail/CMS.

    AIT-LDS-v1.1: Removed carriage return of line endings in audit.log files.

    AIT-LDS-v2.0: Eight datasets, logs from all hosts, system logs and network traffic, mail/CMS/cloud/web.

    Acknowledgements: Partially funded by the FFG projects INDICAETING (868306) and DECEPT (873980), and the EU project GUARD (833456).

    If you use the dataset, please cite the following publication:

    [1] M. Landauer, F. Skopik, M. Wurzenberger, W. Hotwagner and A. Rauber, "Have it Your Way: Generating Customized Log Datasets With a Model-Driven Simulation Testbed," in IEEE Transactions on Reliability, vol. 70, no. 1, pp. 402-415, March 2021, doi: 10.1109/TR.2020.3031317. [PDF]

  6. AIT Log Data Set V2.0

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber (2024). AIT Log Data Set V2.0 [Dataset]. http://doi.org/10.5281/zenodo.5789064
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    AIT Log Data Sets

    This repository contains synthetic log data suitable for evaluation of intrusion detection systems, federated learning, and alert aggregation. A detailed description of the dataset is available in [1]. The logs were collected from eight testbeds that were built at the Austrian Institute of Technology (AIT) following the approach by [2]. Please cite these papers if the data is used for academic publications.

    In brief, each of the datasets corresponds to a testbed representing a small enterprise network including mail server, file share, WordPress server, VPN, firewall, etc. Normal user behavior is simulated to generate background noise over a time span of 4-6 days. At some point, a sequence of attack steps is launched against the network. Log data is collected from all hosts and includes Apache access and error logs, authentication logs, DNS logs, VPN logs, audit logs, Suricata logs, network traffic packet captures, horde logs, exim logs, syslog, and system monitoring logs. Separate ground truth files are used to label events that are related to the attacks. Compared to the AIT-LDSv1.1, a more complex network and diverse user behavior is simulated, and logs are collected from all hosts in the network. If you are only interested in network traffic analysis, we also provide the AIT-NDS containing the labeled netflows of the testbed networks. We also provide the AIT-ADS, an alert data set derived by forensically applying open-source intrusion detection systems on the log data.

    The datasets in this repository have the following structure:

    • The gather directory contains all logs collected from the testbed. Logs collected from each host are located in gather/.
    • The labels directory contains the ground truth of the dataset that indicates which events are related to attacks. The directory mirrors the structure of the gather directory so that each label files is located at the same path and has the same name as the corresponding log file. Each line in the label files references the log event corresponding to an attack by the line number counted from the beginning of the file ("line"), the labels assigned to the line that state the respective attack step ("labels"), and the labeling rules that assigned the labels ("rules"). An example is provided below.
    • The processing directory contains the source code that was used to generate the labels.
    • The rules directory contains the labeling rules.
    • The environment directory contains the source code that was used to deploy the testbed and run the simulation using the Kyoushi Testbed Environment.
    • The dataset.yml file specifies the start and end time of the simulation.

    The following table summarizes relevant properties of the datasets:

    • fox
      • Simulation time: 2022-01-15 00:00 - 2022-01-20 00:00
      • Attack time: 2022-01-18 11:59 - 2022-01-18 13:15
      • Scan volume: High
      • Unpacked size: 26 GB
    • harrison
      • Simulation time: 2022-02-04 00:00 - 2022-02-09 00:00
      • Attack time: 2022-02-08 07:07 - 2022-02-08 08:38
      • Scan volume: High
      • Unpacked size: 27 GB
    • russellmitchell
      • Simulation time: 2022-01-21 00:00 - 2022-01-25 00:00
      • Attack time: 2022-01-24 03:01 - 2022-01-24 04:39
      • Scan volume: Low
      • Unpacked size: 14 GB
    • santos
      • Simulation time: 2022-01-14 00:00 - 2022-01-18 00:00
      • Attack time: 2022-01-17 11:15 - 2022-01-17 11:59
      • Scan volume: Low
      • Unpacked size: 17 GB
    • shaw
      • Simulation time: 2022-01-25 00:00 - 2022-01-31 00:00
      • Attack time: 2022-01-29 14:37 - 2022-01-29 15:21
      • Scan volume: Low
      • Data exfiltration is not visible in DNS logs
      • Unpacked size: 27 GB
    • wardbeck
      • Simulation time: 2022-01-19 00:00 - 2022-01-24 00:00
      • Attack time: 2022-01-23 12:10 - 2022-01-23 12:56
      • Scan volume: Low
      • Unpacked size: 26 GB
    • wheeler
      • Simulation time: 2022-01-26 00:00 - 2022-01-31 00:00
      • Attack time: 2022-01-30 07:35 - 2022-01-30 17:53
      • Scan volume: High
      • No password cracking in attack chain
      • Unpacked size: 30 GB
    • wilson
      • Simulation time: 2022-02-03 00:00 - 2022-02-09 00:00
      • Attack time: 2022-02-07 10:57 - 2022-02-07 11:49
      • Scan volume: High
      • Unpacked size: 39 GB

    The following attacks are launched in the network:

    • Scans (nmap, WPScan, dirb)
    • Webshell upload (CVE-2020-24186)
    • Password cracking (John the Ripper)
    • Privilege escalation
    • Remote command execution
    • Data exfiltration (DNSteal)

    Note that attack parameters and their execution orders vary in each dataset. Labeled log files are trimmed to the simulation time to ensure that their labels (which reference the related event by the line number in the file) are not misleading. Other log files, however, also contain log events generated before or after the simulation time and may therefore be affected by testbed setup or data collection. It is therefore recommended to only consider logs with timestamps within the simulation time for analysis.

    The structure of labels is explained using the audit logs from the intranet server in the russellmitchell data set as an example in the following. The first four labels in the labels/intranet_server/logs/audit/audit.log file are as follows:

    {"line": 1860, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    {"line": 1861, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    {"line": 1862, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    {"line": 1863, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    Each JSON object in this file assigns a label to one specific log line in the corresponding log file located at gather/intranet_server/logs/audit/audit.log. The field "line" in the JSON objects specify the line number of the respective event in the original log file, while the field "labels" comprise the corresponding labels. For example, the lines in the sample above provide the information that lines 1860-1863 in the gather/intranet_server/logs/audit/audit.log file are labeled with "attacker_change_user" and "escalate" corresponding to the attack step where the attacker receives escalated privileges. Inspecting these lines shows that they indeed correspond to the user authenticating as root:

    type=USER_AUTH msg=audit(1642999060.603:2226): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:authentication acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    type=USER_ACCT msg=audit(1642999060.603:2227): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:accounting acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    type=CRED_ACQ msg=audit(1642999060.615:2228): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:setcred acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    type=USER_START msg=audit(1642999060.627:2229): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:session_open acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    The same applies to all other labels for this log file and all other log files. There are no labels for logs generated by "normal" (i.e., non-attack) behavior; instead, all log events that have no corresponding JSON object in one of the files from the labels directory, such as the lines 1-1859 in the example above, can be considered to be labeled as "normal". This means that in order to figure out the labels for the log data it is necessary to store the line numbers when processing the original logs from the gather directory and see if these line numbers also appear in the corresponding file in the labels directory.

    Beside the attack labels, a general overview of the exact times when specific attack steps are launched are available in gather/attacker_0/logs/attacks.log. An enumeration of all hosts and their IP addresses is stated in processing/config/servers.yml. Moreover, configurations of each host are provided in gather/ and gather/.

    Version history:

    • AIT-LDS-v1.x: Four datasets, logs from single host, fine-granular audit logs, mail/CMS.
    • AIT-LDS-v2.0: Eight datasets, logs from all hosts, system logs and network traffic, mail/CMS/cloud/web.

    Acknowledgements: Partially funded by the FFG projects INDICAETING (868306) and DECEPT (873980), and the EU projects GUARD (833456) and PANDORA (SI2.835928).

    If you use the dataset, please cite the following publications:

    [1] M. Landauer, F. Skopik, M. Frank, W. Hotwagner,

  7. EDGAR Log Files (2014 - 2016)

    • redivis.com
    application/jsonl +7
    Updated Mar 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Graduate School of Business Library (2025). EDGAR Log Files (2014 - 2016) [Dataset]. https://redivis.com/datasets/e16k-fn86c13fx
    Explore at:
    stata, arrow, application/jsonl, parquet, sas, csv, spss, avroAvailable download formats
    Dataset updated
    Mar 13, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Graduate School of Business Library
    Time period covered
    Jan 1, 2014 - Dec 31, 2016
    Description

    Abstract

    The EDGAR log file data set provides information on internet search traffic for EDGAR filings through SEC.gov. The data sets contain information extracted from log files from the EDGAR Archive on SEC.gov, and the information can be used to infer user access statistics.

    The current version of this dataset covers search traffic from January 1, 2014 through December 31, 2016.

    Methodology

    Due to the substantial volume of the raw EDGAR Log Files data set, we (Stanford GSB) implemented a series of transformations aimed at reducing its size while retaining essential information needed for research. Below is a summary of the modifications applied to the raw data, resulting in the four tables currently available in this Redivis dataset:

    raw_single_day_per_year:

    • This table contains a single unprocessed day of Edgar Log Data (June 15) from each year. June 15 was selected because it is near the midpoint of the year and is not a U.S. federal holiday.
    • This table was created to help researchers examine a sample of raw data to understand how aggregation was performed in other tables and potentially identify trends using all the unfiltered fields

    %3C!-- --%3E

    aggregated_{YEAR}:

    • **Filter rows **to include only those with code of value '200' with doc/extention values ending in htm, txt, xml, pdf, sgml, html, or xsd

    %3C!-- --%3E

    • **Remove fields %3Cstrong%3Ecik%3C/strong%3E, %3Cstrong%3Etime%3C/strong%3E, %3Cstrong%3Eidx%3C/strong%3E, %3Cstrong%3Esize%3C/strong%3E, and **%3Cstrong%3Ebrowser%3C/strong%3E. Our reasoning for removal of these fields: cik can be obtained through merging with our EDGAR Filings dataset using accession; idx shouldn't change over time for the same doc can be manually recreated via transform of doc; browser is NULL in more than 99.99% of rows across logs and is fully NULL for many dates; size varies according to doc which we have aggregated to reduce size; time does not have a time zone specified and daily data granularity is likely sufficient for research purposes
    • **Aggregate identical rows into a **%3Cstrong%3Edoc_count%3C/strong%3E to represent the number of times a IP viewed a filing each day while keeping the same browser metadata/parameters
    • **Filter aggregate data **to remove rows where doc_count %3E 10,000
    • Standardize data by ensuring relevant field types align with the expected usage by researchers

    %3C!-- --%3E

    raw_{YEAR}:

    • These tables contain a year of unprocessed Edgar Log Data.
    • This table was created to help researchers to use the raw data to potentially identify trends using all the unfiltered fields

    %3C!-- --%3E

    Usage

    From the SEC Edgar Log Website:

    • *The full 2003 - 2017 data set does not have SEC IP addresses for some periods because, at the time, SEC users were not routed to EDGAR the same way as external visitors. For those periods of time, SEC IP addresses do not appear in the logs. *
    • Due to certain limitations, including the existence of lost or damaged files, the information assembled does not capture all SEC.gov website traffic. In addition, it is possible inaccuracies or other errors were introduced into the data during the process of extracting and compiling the data.

    %3C!-- --%3E

  8. u

    Human-Computer Interaction Logs

    • indigo.uic.edu
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Theis; Houshang Darabi (2023). Human-Computer Interaction Logs [Dataset]. http://doi.org/10.25417/uic.11923386.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    University of Illinois Chicago
    Authors
    Julian Theis; Houshang Darabi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises ten human-computer interaction logs of real participants who solved a given task in a Windows environment. The participants were allowed to use the standard notepad, calculator, and file explorer. All recordings are anonymized and do not contain any private information.Simple:Each of the five log files in the folder simple contains Human-Computer Interaction recordings of a participant solving a simple task. Participants were provided 30 raw text files where each one contained data about the revenue and expenses of a single product for a given time period. In total 15 summaries were asked to be created by summarizing the data of two files and calculating the combined revenue, expenses, and profit. Complex:Each of the five log files in the folder complex contains Human-Computer Interaction recordings of a participant solving a more advanced task. In particular, participants were given a folder of text documents and were asked to create summary documents that contain the total revenue and expenses of the quarter, profit, and, where applicable, profit improvement compared to the previous quarter and the same quarter of the previous year. Each quarter’s data comprised multiple text files.The logging application that has been used is the one described inJulian Theis and Houshang Darabi. 2019. Behavioral Petri Net Mining and Automated Analysis for Human-Computer Interaction Recommendations in Multi-Application Environments. Proc. ACM Hum.-Comput. Interact. 3, EICS, Article 13 (June 2019), 16 pages. DOI: https://doi.org/10.1145/3331155Please refer to Table 1 and Table 2 of this publication regarding the structure of the log files. The first column corresponds to the timestamp in milliseconds, the second column represents the event key, and the third column contains additional event-specific information.

  9. i

    Data from: Log file

    • ieee-dataport.org
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SOHIL MUSANI (2023). Log file [Dataset]. https://ieee-dataport.org/documents/log-file
    Explore at:
    Dataset updated
    Dec 20, 2023
    Authors
    SOHIL MUSANI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    human error

  10. m

    Data from: Pillar 3: Pre-processed web server log file dataset of the...

    • data.mendeley.com
    Updated Dec 6, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michal Munk (2021). Pillar 3: Pre-processed web server log file dataset of the banking institution [Dataset]. http://doi.org/10.17632/5bvkm76sdc.1
    Explore at:
    Dataset updated
    Dec 6, 2021
    Authors
    Michal Munk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset represents the pre-processed web server log file of the commercial bank. The source of data is the web server of the bank and keeps access of web users starting the year 2009 till 2012. It contains accesses to the bank website during and after the financial crisis. Unnecessary data saved by the web server was removed to keep the focus only on the textual content of the website. Many variables were added to the original log file to make the analysis workable. To keep the privacy of website users, sensitive information in the log file were anonymized. The dataset offers the way to understand the behaviour of stakeholders during and after the crisis and how they comply with the Basel regulations.

  11. Server Logs

    • kaggle.com
    Updated Oct 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vishnu U (2021). Server Logs [Dataset]. https://www.kaggle.com/vishnu0399/server-logs/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 12, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vishnu U
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The dataset is a synthetically generated server log based on Apache Server Logging Format. Each line corresponds to each log entry. The log entry has the following parameters :

    Components in Log Entry :

    • IP of client: This refers to the IP address of the client that sent the request to the server.
    • Remote Log Name: Remote name of the User performing the request. In the majority of the applications, this is confidential information and is hidden or not available.
    • User ID: The ID of the user performing the request. In the majority of the applications, this is a piece of confidential information and is hidden or not available.
    • Date and Time in UTC format: The date and time of the request are represented in UTC format as follows: - Day/Month/Year:Hour:Minutes: Seconds +Time-Zone-Correction.
    • Request Type: The type of request (GET, POST, PUT, DELETE) that the server got. This depends on the operation that the request will do.
    • API: The API of the website to which the request is related. Example: When a user accesses a carton shopping website, the API comes as /usr/cart.
    • Protocol and Version: Protocol used for connecting with server and its version.
    • Status Code: Status code that the server returned for the request. Eg: 404 is sent when a requested resource is not found. 200 is sent when the request was successfully served.
    • Byte: The amount of data in bytes that was sent back to the client.
    • Referrer: The websites/source from where the user was directed to the current website. If none it is represented by “-“.
    • UA String: The user agent string contains details of the browser and the host device (like the name, version, device type etc.).
    • Response Time: The response time the server took to serve the request. This is the difference between the timestamps when the request was received and when the request was served.

    Content

    The dataset consists of two files - - logfiles.log is the actual log file in text format - TestFileGenerator.py is the synthetic log file generator. The number of log entries required can be edited in the code.

  12. Utah FORGE: Well 58-32 Schlumberger FMI Logs DLIS and XML files

    • data.openei.org
    • gdr.openei.org
    • +3more
    archive, data +1
    Updated Nov 17, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Greg Nash; Joe Moore; Greg Nash; Joe Moore (2017). Utah FORGE: Well 58-32 Schlumberger FMI Logs DLIS and XML files [Dataset]. http://doi.org/10.15121/1464529
    Explore at:
    data, website, archiveAvailable download formats
    Dataset updated
    Nov 17, 2017
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    Open Energy Data Initiative (OEDI)
    Energy and Geoscience Institute at the University of Utah
    Authors
    Greg Nash; Joe Moore; Greg Nash; Joe Moore
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This zipped data set includes Schlumberger FMI logs DLIS and XML files from Utah FORGE deep well 58-32. These include runs 1 (2226-7550 ft) and 2 (7440-7550 ft). Run 3 (7390-7527ft) was acquired during phase 2C.

  13. R

    Scripts to manage log files from URZF flight mills (2024)

    • entrepot.recherche.data.gouv.fr
    application/x-ruby +5
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Sauvard; Daniel Sauvard (2025). Scripts to manage log files from URZF flight mills (2024) [Dataset]. http://doi.org/10.57745/YLMCNU
    Explore at:
    application/x-ruby(21222), application/x-ruby(1332), application/x-ruby(2584), bin(549), application/x-ruby(6070), application/x-ruby(30268), application/x-ruby(3207), application/x-ruby(1240), application/x-ruby(1356), text/x-r-source(98), application/x-ruby(17487), txt(418), application/x-ruby(5528), application/x-ruby(567), application/x-ruby(10824), application/x-ruby(6014), application/x-ruby(2035), application/x-ruby(24420), application/x-ruby(1374), application/x-ruby(2353), application/x-ruby(7078), application/x-ruby(1590), text/x-r-source(244), application/x-ruby(1081), text/x-tex(270), text/x-r-source(870), application/x-ruby(2575), application/x-ruby(6402), application/x-ruby(947), txt(82), application/x-ruby(1051), application/x-ruby(4984), text/x-r-source(851), application/x-shellscript(2510), application/x-ruby(11294), text/x-r-source(2747), text/x-r-source(2215), application/x-ruby(1156), application/x-ruby(12244), txt(114)Available download formats
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Recherche Data Gouv
    Authors
    Daniel Sauvard; Daniel Sauvard
    License

    https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html

    Description

    We built flight mills to estimate flight capacities of insects and developed programs to manage them. During experiments, theses programs build specific data files (called log files). Basically, the log files are CSV-like files with a preamble and a postamble. The dataset includes scripts in Ruby and R languages to extract data from these log files, analyze them and produce usable R data files including summarized values. In conjunction to log files, the scripts are able to gathered local data and commands.

  14. Z

    Data from: LogChunks: A Data Set for Build Log Analysis

    • data.niaid.nih.gov
    Updated Jan 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Panichella, Annibale (2020). LogChunks: A Data Set for Build Log Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3632350
    Explore at:
    Dataset updated
    Jan 31, 2020
    Dataset provided by
    Brandt, Carolin
    Panichella, Annibale
    Beller, Moritz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We collected 797 Travis CI logs from a wide range of 80 GitHub repositories from 29 different main development languages. You can find our collection tool in log-collection and the logs sorted by language and repository in logs.

    We manually labeled the part (chunk) of the log describing why the build failed.In addition, the chunks are annotated with keywords that we would use to search for them and categorized according to their structural representation within the log. You can find this data in an xml-file for each repository in build-failure-reason.

  15. IWW 24 Hour Log-Master File/Data (Categorized Records)

    • catalog.data.gov
    Updated Jul 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DHS (2025). IWW 24 Hour Log-Master File/Data (Categorized Records) [Dataset]. https://catalog.data.gov/dataset/iww-24-hour-log-master-file-data-categorized-records-51f4b
    Explore at:
    Dataset updated
    Jul 13, 2025
    Dataset provided by
    U.S. Department of Homeland Securityhttp://www.dhs.gov/
    Description

    The 24-Hour Log is setup to alert management of records that will be due for review prior to the expiration date. At that time (or any time beforehand) a record can be reviewed and certified by the analyst that there is still a mission need to retain the information. The expiration date will then be set for an additional year out This can go on for as long as the information is deemed necessary for the mission. If a record arrives at its expiration date without being reviewed and approved, the record will automatically be purged from the system.

  16. o

    Electricity Baseline 2020 Background Data and Log File

    • osti.gov
    Updated Jun 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jamieson, Matthew; W Davis, Tyler; Young, Ben (2025). Electricity Baseline 2020 Background Data and Log File [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/2569605
    Explore at:
    Dataset updated
    Jun 16, 2025
    Dataset provided by
    USDOE Office of Fossil Energy (FE)
    NETL
    Authors
    Jamieson, Matthew; W Davis, Tyler; Young, Ben
    Description

    The ElectricityLCI v2 Python package (https://github.com/USEPA/ElectricityLCI/tree/v2.0) was used to generate the 2020 electricity baseline: a regionalized life cycle inventory model of U.S. electricity generation, consumption, and distribution using standardized facility and generation data. ElectricityLCI implements a local data store for downloading and accessing public data on an individual's computer. The data store follows the folder definition provided by USEPA's esupy Python package (https://github.com/USEPA/esupy), which utilizes the appdirs Python dependency (https://pypi.org/project/appdirs/). An overview of the ElectricityLCI data stores may be found on the README (https://github.com/USEPA/ElectricityLCI/blob/v2.0/README.md#data-store). This submission includes the background data used to generate the 2020 electricity baseline inventory. Each zip archive stores the source files as found in their data stores. Sub-folders in each of the data stores are archived separately. For example, stewi.zip contains the JSON files, while stewi.facility.zip is the 'facility' sub-folder of stewi data store that stores the parquet files. To reproduce the data store, extract each zip file and drag-and-drop sub-folders in to their appropriate root folders to recreate the data stores, then copy the root folders to your data store folder (as returned by running the following on the command line: python -c "import appdirs; print(appdirs.user_data_dir())"). The main five data stores include: 'electricitylci', 'facilitymatcher', 'fedelemflowlist',more » 'stewi', and 'stewicombo'. The log file generated by the 2020 model run is also included, which contains the statements at the DEBUG level and above.« less

  17. h

    Data from: Logfile

    • huggingface.co
    Updated Jun 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Namit G (2024). Logfile [Dataset]. https://huggingface.co/datasets/Namitg02/Logfile
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2024
    Authors
    Namit G
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Namitg02/Logfile dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. LO2: Microservice Dataset of Logs and Metrics

    • zenodo.org
    bin, pdf, zip
    Updated Feb 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Bakhtin; Alexander Bakhtin; Jesse Nyyssölä; Jesse Nyyssölä; Yuqing Wang; Yuqing Wang; Noman Ahmad; Noman Ahmad; Ke Ping; Ke Ping; Matteo Esposito; Matteo Esposito; Mika Mäntylä; Mika Mäntylä; Davide Taibi; Davide Taibi (2025). LO2: Microservice Dataset of Logs and Metrics [Dataset]. http://doi.org/10.5281/zenodo.14938118
    Explore at:
    zip, bin, pdfAvailable download formats
    Dataset updated
    Feb 28, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexander Bakhtin; Alexander Bakhtin; Jesse Nyyssölä; Jesse Nyyssölä; Yuqing Wang; Yuqing Wang; Noman Ahmad; Noman Ahmad; Ke Ping; Ke Ping; Matteo Esposito; Matteo Esposito; Mika Mäntylä; Mika Mäntylä; Davide Taibi; Davide Taibi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LO2 dataset

    This is the data repository for the LO2 dataset.

    Here is an overview of the contents.

    lo2-data.zip

    This is the main dataset. This is the completely unedited output of our data collection process. Note that the uncompressed size is around 540 GB. For more information, see the paper and the data-appendix in this repository.

    lo2-sample.zip

    This is a sample that contains the data used for preliminary analysis. It contains only service logs and the most relevant metrics for the first 100 runs. Furthermore, the metrics are combined on a run level to a single csv to make them easier to utilize.

    data-appendix.pdf

    This document contains further details and stats about the full dataset. These include file size distributions, empty file analysis, log type analysis and the appearance of an unknown file.

    lo2-scripts.zip

    Various scripts for processing the data to create the sample, to conduct the preliminary analysis and to create the statistics seen in the data-appendix.

    • csv_generator.py, csv_merge*.py: These scripts create and combine the metrics into csv files. They need to be run in order. Merging runs to global is very memory intensive.
    • findempty.py: Finds empty files in the folders. As some are expected to be empty, it also counts the unexpected ones. Used in data-appendix.
    • loglead_lo2.py: Script for the preliminary analysis of the logs for error detection. Requires LogLead version 1.2.1.
    • logstats.py: Counts log lines and their type. Used for creating the figure of number of lines per type and service.
    • node_exporter_metrics.txt: Metric descriptions exported from Prometheus (text file).
    • pca.py: The Principal Component Analysis script used for preliminary analysis.
    • reduce_logs.py: Very important for fair analysis as in the beginning of the files there are some initialization rows that leak information regarding correctness.
    • requirements.txt: Required Python libraries to run the scripts.
    • sizedist.py: Creating distributions of file sizes per filename for the data-appendix.

    Version v3: Updated data appendix introduction, added another stage in the log analysis process in loglead_lo2.py

  19. IWW 24 Hour Log-Master File/Data (24 Hour Log Data)

    • catalog.data.gov
    • gimi9.com
    • +1more
    Updated Jul 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DHS (2025). IWW 24 Hour Log-Master File/Data (24 Hour Log Data) [Dataset]. https://catalog.data.gov/dataset/iww-24-hour-log-master-file-data-24-hour-log-data-eb3d3
    Explore at:
    Dataset updated
    Jul 13, 2025
    Dataset provided by
    U.S. Department of Homeland Securityhttp://www.dhs.gov/
    Description

    The 24-Hour Log data can only be retained if the data is relevant to the Homeland Security mission and can be legally retained under Intelligence Oversight regulations. rnrnThe information entered into the log is dependent upon the content of the source report used to generate the log entry. The information for each incident varies depending upon the incident and circumstances surrounding the collection of information about the incident. rnrnInformation may be collected about the person who reported the incident and people involved in a reported incident, which may turn up varying levels of personal information, most often name and citizenship. Additional personal information may be collected and may include, but is not limited to, Social Security Number, passport or driver's license numbers or other identifying information; location of residency, names of associates, political or religious aff1hat1ons or membership m some group or organization, and other information deemed important by the reporting official.

  20. Z

    Comprehensive Network Logs Dataset for Multi-Device Analysis

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hasan, Raza (2024). Comprehensive Network Logs Dataset for Multi-Device Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10492769
    Explore at:
    Dataset updated
    Jan 11, 2024
    Dataset provided by
    Hasan, Raza
    Salman, Mahmood
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises diverse logs from various sources, including cloud services, routers, switches, virtualization, network security appliances, authentication systems, DNS, operating systems, packet captures, proxy servers, servers, syslog data, and network data. The logs encompass a wide range of information such as traffic details, user activities, authentication events, DNS queries, network flows, security actions, and system events. By analyzing these logs collectively, users can gain insights into network patterns, anomalies, user authentication, cloud service usage, DNS traffic, network flows, security incidents, and system activities. The dataset is invaluable for network monitoring, performance analysis, anomaly detection, security investigations, and correlating events across the entire network infrastructure.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
EDGAR Business Office (2025). EDGAR Log File Data Sets [Dataset]. https://catalog.data.gov/dataset/edgar-log-file-data-set
Organization logo

EDGAR Log File Data Sets

Explore at:
298 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jul 16, 2025
Dataset provided by
Electronic Data Gathering, Analysis, and Retrievalhttp://www.sec.gov/edgar.shtml
Description

The data sets provide information on internet search traffic for EDGAR filings through SEC.gov.

Search
Clear search
Close search
Google apps
Main menu