67 datasets found
  1. Student Learning Interaction Logs Dataset

    • kaggle.com
    zip
    Updated Aug 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziya (2025). Student Learning Interaction Logs Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/student-learning-interaction-logs-dataset
    Explore at:
    zip(425369 bytes)Available download formats
    Dataset updated
    Aug 8, 2025
    Authors
    Ziya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📝 Description: This dataset captures simulated student interactions in a digital learning environment. Each row represents a unique learning session, containing comprehensive information about student behavior, engagement, performance, and progression over time.

    The dataset is designed to support research and development in personalized education, adaptive learning systems, student engagement analysis, and feedback optimization. It enables the study of learning patterns and offers insights into how students interact with digital content, how they perform in assessments, and how their learning behavior evolves across sessions.

    🌟 Key Features: Feature Description student_id Unique identifier for each student session_id Unique ID for each learning session timestamp Date and time of the session module_id Course/module accessed during the session time_spent_minutes Time spent in the session (in minutes) pages_visited Number of content pages visited video_watched_percent Percentage of video watched during the session click_events Number of interactions (clicks, navigations, etc.) notes_taken Whether the student took notes (1 = yes, 0 = no) forum_posts Number of forum posts/comments made revisit_flag Indicates if content was revisited quiz_score Score obtained in the session quiz (0–100) attempts_taken Number of quiz attempts made assignment_score Score in the session’s assignment (0–100) feedback_rating Student’s feedback rating for the session (1–5) days_since_last_activity Number of days since last session cumulative_quiz_score Running total of all previous quiz scores learning_trend Average performance across sessions attention_score Derived indicator of engagement during the session feedback_type Type of feedback given (e.g., revise topic, pace slow) next_module_prediction Suggested next module for the student success_label Indicator of learning success (1 = successful, 0 = not)

    📊 Dataset Overview: Total Records: 9,000+ learning sessions

    Total Students: 300

    Total Features: 22

    Data Format: CSV

    Time-Series Ready: Yes (sequential session data per student)

    💡 Use Cases: Analyze and visualize student learning patterns

    Evaluate content engagement and session behavior

    Develop personalized learning dashboards or analytics tools

    Simulate adaptive feedback systems for digital education

  2. Z

    Comprehensive Network Logs Dataset for Multi-Device Analysis

    • data.niaid.nih.gov
    Updated Jan 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salman, Mahmood; Hasan, Raza (2024). Comprehensive Network Logs Dataset for Multi-Device Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10492769
    Explore at:
    Dataset updated
    Jan 11, 2024
    Dataset provided by
    Southampton Solent University
    Malaysia University of Science and Technology
    Authors
    Salman, Mahmood; Hasan, Raza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises diverse logs from various sources, including cloud services, routers, switches, virtualization, network security appliances, authentication systems, DNS, operating systems, packet captures, proxy servers, servers, syslog data, and network data. The logs encompass a wide range of information such as traffic details, user activities, authentication events, DNS queries, network flows, security actions, and system events. By analyzing these logs collectively, users can gain insights into network patterns, anomalies, user authentication, cloud service usage, DNS traffic, network flows, security incidents, and system activities. The dataset is invaluable for network monitoring, performance analysis, anomaly detection, security investigations, and correlating events across the entire network infrastructure.

  3. Process Mining Event Log - Incident Management

    • kaggle.com
    zip
    Updated Apr 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto P (2025). Process Mining Event Log - Incident Management [Dataset]. https://www.kaggle.com/datasets/albertopmd/process-mining-event-log-incident-management
    Explore at:
    zip(2301112 bytes)Available download formats
    Dataset updated
    Apr 20, 2025
    Authors
    Alberto P
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This realistic incident management event log simulates a common IT service process and includes key inefficiencies found in real-world operations. You'll uncover SLA violations, multiple reassignments, bottlenecks, and conformance issues—making it an ideal dataset for hands-on process mining, root cause analysis, and performance optimization exercises.

    You can find more event logs + use case handbooks to guide your analysis here: https://processminingdata.com/

    Standard Process Flow: Ticket Created -> Ticket Assigned to Level 1 Support -> WIP - Level 1 Support -> Level 1 Escalates to Level 2 Support -> WIP - Level 2 Support -> Ticket Solved by Level 2 Support -> Customer Feedback Received -> Ticket Closed

    Total Number of Incident Tickets: 31,000+

    Process Variants: 13

    Number of Events: 242,000+

    Year: 2023

    File Format: CSV

    File Size: 65MB

  4. 📱📳📴📶 Application logs on mobile devices

    • kaggle.com
    Updated Oct 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Kapturov (2024). 📱📳📴📶 Application logs on mobile devices [Dataset]. https://www.kaggle.com/datasets/kapturovalexander/application-logs-on-mobile-devices
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 7, 2024
    Dataset provided by
    Kaggle
    Authors
    Alexander Kapturov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    😊If You downloaded this dataset or it is useful to You, please upvote it!

    Description:

    This dataset contains records of events and errors related to the operation of mobile applications on various mobile devices. Each entry includes information about the timestamp, device characteristics, session identifiers, and textual descriptions of events or errors.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10074224%2F72a315b39866c02162b229d5a209f4b4%2F5.png?generation=1695227457850330&alt=media" alt=""> Data Fields: - Status: A numerical indicator of the event status (e.g., 0 for success, 1 for error). - Event: A textual description of the action or event, including error text if an error occurred. - Device Identification: Information about the mobile device, including model and Android version. - App Version: The version of the mobile application experiencing the event. - App Language: The language in which the application is running. - Android Version: The version of the Android operating system on the device. - Session Identifiers: Unique session or device identifiers associated with the event. - Additional Data: Additional event details, such as the country and other characteristics. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10074224%2Fbca8f9b9fb8288e258a59fad5e53ac15%2F4.png?generation=1695227273200372&alt=media" alt="">

  5. AIT Log Data Set V2.0

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jun 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber (2024). AIT Log Data Set V2.0 [Dataset]. http://doi.org/10.5281/zenodo.5789064
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    AIT Log Data Sets

    This repository contains synthetic log data suitable for evaluation of intrusion detection systems, federated learning, and alert aggregation. A detailed description of the dataset is available in [1]. The logs were collected from eight testbeds that were built at the Austrian Institute of Technology (AIT) following the approach by [2]. Please cite these papers if the data is used for academic publications.

    In brief, each of the datasets corresponds to a testbed representing a small enterprise network including mail server, file share, WordPress server, VPN, firewall, etc. Normal user behavior is simulated to generate background noise over a time span of 4-6 days. At some point, a sequence of attack steps is launched against the network. Log data is collected from all hosts and includes Apache access and error logs, authentication logs, DNS logs, VPN logs, audit logs, Suricata logs, network traffic packet captures, horde logs, exim logs, syslog, and system monitoring logs. Separate ground truth files are used to label events that are related to the attacks. Compared to the AIT-LDSv1.1, a more complex network and diverse user behavior is simulated, and logs are collected from all hosts in the network. If you are only interested in network traffic analysis, we also provide the AIT-NDS containing the labeled netflows of the testbed networks. We also provide the AIT-ADS, an alert data set derived by forensically applying open-source intrusion detection systems on the log data.

    The datasets in this repository have the following structure:

    • The gather directory contains all logs collected from the testbed. Logs collected from each host are located in gather/.
    • The labels directory contains the ground truth of the dataset that indicates which events are related to attacks. The directory mirrors the structure of the gather directory so that each label files is located at the same path and has the same name as the corresponding log file. Each line in the label files references the log event corresponding to an attack by the line number counted from the beginning of the file ("line"), the labels assigned to the line that state the respective attack step ("labels"), and the labeling rules that assigned the labels ("rules"). An example is provided below.
    • The processing directory contains the source code that was used to generate the labels.
    • The rules directory contains the labeling rules.
    • The environment directory contains the source code that was used to deploy the testbed and run the simulation using the Kyoushi Testbed Environment.
    • The dataset.yml file specifies the start and end time of the simulation.

    The following table summarizes relevant properties of the datasets:

    • fox
      • Simulation time: 2022-01-15 00:00 - 2022-01-20 00:00
      • Attack time: 2022-01-18 11:59 - 2022-01-18 13:15
      • Scan volume: High
      • Unpacked size: 26 GB
    • harrison
      • Simulation time: 2022-02-04 00:00 - 2022-02-09 00:00
      • Attack time: 2022-02-08 07:07 - 2022-02-08 08:38
      • Scan volume: High
      • Unpacked size: 27 GB
    • russellmitchell
      • Simulation time: 2022-01-21 00:00 - 2022-01-25 00:00
      • Attack time: 2022-01-24 03:01 - 2022-01-24 04:39
      • Scan volume: Low
      • Unpacked size: 14 GB
    • santos
      • Simulation time: 2022-01-14 00:00 - 2022-01-18 00:00
      • Attack time: 2022-01-17 11:15 - 2022-01-17 11:59
      • Scan volume: Low
      • Unpacked size: 17 GB
    • shaw
      • Simulation time: 2022-01-25 00:00 - 2022-01-31 00:00
      • Attack time: 2022-01-29 14:37 - 2022-01-29 15:21
      • Scan volume: Low
      • Data exfiltration is not visible in DNS logs
      • Unpacked size: 27 GB
    • wardbeck
      • Simulation time: 2022-01-19 00:00 - 2022-01-24 00:00
      • Attack time: 2022-01-23 12:10 - 2022-01-23 12:56
      • Scan volume: Low
      • Unpacked size: 26 GB
    • wheeler
      • Simulation time: 2022-01-26 00:00 - 2022-01-31 00:00
      • Attack time: 2022-01-30 07:35 - 2022-01-30 17:53
      • Scan volume: High
      • No password cracking in attack chain
      • Unpacked size: 30 GB
    • wilson
      • Simulation time: 2022-02-03 00:00 - 2022-02-09 00:00
      • Attack time: 2022-02-07 10:57 - 2022-02-07 11:49
      • Scan volume: High
      • Unpacked size: 39 GB

    The following attacks are launched in the network:

    • Scans (nmap, WPScan, dirb)
    • Webshell upload (CVE-2020-24186)
    • Password cracking (John the Ripper)
    • Privilege escalation
    • Remote command execution
    • Data exfiltration (DNSteal)

    Note that attack parameters and their execution orders vary in each dataset. Labeled log files are trimmed to the simulation time to ensure that their labels (which reference the related event by the line number in the file) are not misleading. Other log files, however, also contain log events generated before or after the simulation time and may therefore be affected by testbed setup or data collection. It is therefore recommended to only consider logs with timestamps within the simulation time for analysis.

    The structure of labels is explained using the audit logs from the intranet server in the russellmitchell data set as an example in the following. The first four labels in the labels/intranet_server/logs/audit/audit.log file are as follows:

    {"line": 1860, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    {"line": 1861, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    {"line": 1862, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    {"line": 1863, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    Each JSON object in this file assigns a label to one specific log line in the corresponding log file located at gather/intranet_server/logs/audit/audit.log. The field "line" in the JSON objects specify the line number of the respective event in the original log file, while the field "labels" comprise the corresponding labels. For example, the lines in the sample above provide the information that lines 1860-1863 in the gather/intranet_server/logs/audit/audit.log file are labeled with "attacker_change_user" and "escalate" corresponding to the attack step where the attacker receives escalated privileges. Inspecting these lines shows that they indeed correspond to the user authenticating as root:

    type=USER_AUTH msg=audit(1642999060.603:2226): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:authentication acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    type=USER_ACCT msg=audit(1642999060.603:2227): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:accounting acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    type=CRED_ACQ msg=audit(1642999060.615:2228): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:setcred acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    type=USER_START msg=audit(1642999060.627:2229): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:session_open acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    The same applies to all other labels for this log file and all other log files. There are no labels for logs generated by "normal" (i.e., non-attack) behavior; instead, all log events that have no corresponding JSON object in one of the files from the labels directory, such as the lines 1-1859 in the example above, can be considered to be labeled as "normal". This means that in order to figure out the labels for the log data it is necessary to store the line numbers when processing the original logs from the gather directory and see if these line numbers also appear in the corresponding file in the labels directory.

    Beside the attack labels, a general overview of the exact times when specific attack steps are launched are available in gather/attacker_0/logs/attacks.log. An enumeration of all hosts and their IP addresses is stated in processing/config/servers.yml. Moreover, configurations of each host are provided in gather/ and gather/.

    Version history:

    • AIT-LDS-v1.x: Four datasets, logs from single host, fine-granular audit logs, mail/CMS.
    • AIT-LDS-v2.0: Eight datasets, logs from all hosts, system logs and network traffic, mail/CMS/cloud/web.

    Acknowledgements: Partially funded by the FFG projects INDICAETING (868306) and DECEPT (873980), and the EU projects GUARD (833456) and PANDORA (SI2.835928).

    If you use the dataset, please cite the following publications:

    [1] M. Landauer, F. Skopik, M. Frank, W. Hotwagner,

  6. NLPLog.json

    • figshare.com
    txt
    Updated May 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuhe Ji (2025). NLPLog.json [Dataset]. http://doi.org/10.6084/m9.figshare.29143721.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 24, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Yuhe Ji
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset developed in the paper titled 'Adapting Large Language Models to Log Analysis with Interpretable Domain Knowledge' is designed to transform information found in logs into interpretable knowledge for subsequent use in large language model training.

  7. SAP Application Log Analysis pilot dataset

    • data.europa.eu
    unknown
    Updated Feb 14, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2019). SAP Application Log Analysis pilot dataset [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-2566022?locale=de
    Explore at:
    unknown(11261081)Available download formats
    Dataset updated
    Feb 14, 2019
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    Description

    A CSV file prepared from application logs stemming from an SAP BI Warehouse system. This realistic dataset was generated as a means to showcase the SAP pilot use-case of the TOREADOR project. Each line corresponds to a user action. Extid, object and subobject were extracted from the BI system logs, along with the user name and event date. Role was retrieved from the standard user actions Label indicates whether the event is benign or malign - Elevation_of_privileges is an event the user should not be able to perform within the boundaries of his role. Priv_abuse is about a privileged account performing an action breaching a confidentiality clause (e.g. an administrator reading sensitive data). Forgotten_user is about a user who stayed inactive for a long time before being used again (e.g. an employee who left the company where the account was not terminated). Records where no malign activity was detected were marked 'benign'. Outbushours was computed from the time of the action, and mapped to 'inside' or 'outside' business hours.

  8. AIT Log Data Set V1.1

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Landauer; Florian Skopik; Markus Wurzenberger; Wolfgang Hotwagner; Andreas Rauber; Max Landauer; Florian Skopik; Markus Wurzenberger; Wolfgang Hotwagner; Andreas Rauber (2023). AIT Log Data Set V1.1 [Dataset]. http://doi.org/10.5281/zenodo.4264796
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 18, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Max Landauer; Florian Skopik; Markus Wurzenberger; Wolfgang Hotwagner; Andreas Rauber; Max Landauer; Florian Skopik; Markus Wurzenberger; Wolfgang Hotwagner; Andreas Rauber
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    AIT Log Data Sets

    This repository contains synthetic log data suitable for evaluation of intrusion detection systems. The logs were collected from four independent testbeds that were built at the Austrian Institute of Technology (AIT) following the approach by Landauer et al. (2020) [1]. Please refer to the paper for more detailed information on automatic testbed generation and cite it if the data is used for academic publications. In brief, each testbed simulates user accesses to a webserver that runs Horde Webmail and OkayCMS. The duration of the simulation is six days. On the fifth day (2020-03-04) two attacks are launched against each web server.

    The archive AIT-LDS-v1_0.zip contains the directories "data" and "labels".

    The data directory is structured as follows. Each directory mail.

    Setup details of the web servers:

    • OS: Debian Stretch 9.11.6
    • Services:
      • Apache2
      • PHP7
      • Exim 4.89
      • Horde 5.2.22
      • OkayCMS 2.3.4
      • Suricata
      • ClamAV
      • MariaDB

    Setup details of user machines:

    • OS: Ubuntu Bionic
    • Services:
      • Chromium
      • Firefox

    User host machines are assigned to web servers in the following way:

    • mail.cup.com is accessed by users from host machines user-{0, 1, 2, 6}
    • mail.spiral.com is accessed by users from host machines user-{3, 5, 8}
    • mail.insect.com is accessed by users from host machines user-{4, 9}
    • mail.onion.com is accessed by users from host machines user-{7, 10}

    The following attacks are launched against the web servers (different starting times for each web server, please check the labels for exact attack times):

    • Attack 1: multi-step attack with sequential execution of the following attacks:
      • nmap scan
      • nikto scan
      • smtp-user-enum tool for account enumeration
      • hydra brute force login
      • webshell upload through Horde exploit (CVE-2019-9858)
      • privilege escalation through Exim exploit (CVE-2019-10149)
    • Attack 2: webshell injection through malicious cookie (CVE-2019-16885)

    Attacks are launched from the following user host machines. In each of the corresponding directories user-

    • user-6 attacks mail.cup.com
    • user-5 attacks mail.spiral.com
    • user-4 attacks mail.insect.com
    • user-7 attacks mail.onion.com

    The log data collected from the web servers includes

    • Apache access and error logs
    • syscall logs collected with the Linux audit daemon
    • suricata logs
    • exim logs
    • auth logs
    • daemon logs
    • mail logs
    • syslogs
    • user logs


    Note that due to their large size, the audit/audit.log files of each server were compressed in a .zip-archive. In case that these logs are needed for analysis, they must first be unzipped.

    Labels are organized in the same directory structure as logs. Each file contains two labels for each log line separated by a comma, the first one based on the occurrence time, the second one based on similarity and ordering. Note that this does not guarantee correct labeling for all lines and that no manual corrections were conducted.

    Version history and related data sets:

    • AIT-LDS-v1.0: Four datasets, logs from single host, fine-granular audit logs, mail/CMS.
      • AIT-LDS-v1.1: Removed carriage return of line endings in audit.log files.
    • AIT-LDS-v2.0: Eight datasets, logs from all hosts, system logs and network traffic, mail/CMS/cloud/web.

    Acknowledgements: Partially funded by the FFG projects INDICAETING (868306) and DECEPT (873980), and the EU project GUARD (833456).

    If you use the dataset, please cite the following publication:

    [1] M. Landauer, F. Skopik, M. Wurzenberger, W. Hotwagner and A. Rauber, "Have it Your Way: Generating Customized Log Datasets With a Model-Driven Simulation Testbed," in IEEE Transactions on Reliability, vol. 70, no. 1, pp. 402-415, March 2021, doi: 10.1109/TR.2020.3031317. [PDF]

  9. OS Kernel Anomaly Dataset

    • kaggle.com
    zip
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziya (2025). OS Kernel Anomaly Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/os-kernel-anomaly-dataset
    Explore at:
    zip(15689 bytes)Available download formats
    Dataset updated
    May 5, 2025
    Authors
    Ziya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is designed to support research in anomaly detection for OS kernels, particularly in the context of power monitoring systems used in embedded environments. It simulates the interaction between system-level operations and power consumption behaviors, providing a rich set of features for training and evaluating hybrid models.

    The dataset contains 1,000 records of yet realistic system behavior, including:

    System call sequences

    Power usage logs (in watts)

    CPU and memory utilization

    Process identifiers and names

    Timestamps

    Labeled entries (Normal or Anomaly)

    Anomalies are injected using fuzzy testing principles to simulate abnormal power spikes, syscall irregularities, or excessive resource usage, mimicking real-world kernel faults or malicious activity. This dataset enables the development of robust models that can learn complex, uncertain system behavior patterns for enhanced security and stability of embedded power monitoring applications.

  10. Failure dataset

    • figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pietro Liguori; Luigi De Simone; Roberto Natella (2023). Failure dataset [Dataset]. http://doi.org/10.6084/m9.figshare.7732268.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Pietro Liguori; Luigi De Simone; Roberto Natella
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    This failure dataset contains the injected faults, the workload, the effects of failure (both the user-side impact and our own in-depth correctness checks), and the error logs produced by the OpenStack cloud management system.Please refers to the paper "How Bad Can a Bug Get? Empirical analysis of software failures in the OpenStack cloud computing platform" (ESEC/FSE '19).Please, cite the following paper if you use the dataset:@inproceedings{cotroneo2019bad,title={How bad can a bug get? an empirical analysis of software failures in the OpenStack cloud computing platform},author={Cotroneo, Domenico and De Simone, Luigi and Liguori, Pietro and Natella, Roberto and Bidokhti, Nematollah},booktitle={Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering},pages={200--211},year={2019} }Visit the github repo for any updates: https://github.com/dessertlab/Fault-Injection-Dataset

  11. Dataset: An IoT-Enriched Event Log for Process Mining in Smart Factories

    • figshare.com
    txt
    Updated Jun 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lukas Malburg; Joscha Grüger; Ralph Bergmann (2024). Dataset: An IoT-Enriched Event Log for Process Mining in Smart Factories [Dataset]. http://doi.org/10.6084/m9.figshare.20130794.v6
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 5, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Lukas Malburg; Joscha Grüger; Ralph Bergmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Modern technologies such as the Internet of Things (IoT) are becoming increasingly important in various domains, including Business Process Management (BPM) research. One main research area in BPM is process mining, which can be used to analyze event logs, e.g., for checking the conformance of running processes. However, there are only a few IoT-based event logs available for research purposes. Some of them are artificially generated and the problem occurs that they do not always completely reflect the actual physical properties of smart environments. In this paper, we present an IoT-enriched XES event log that is generated by a physical smart factory. For this purpose, we create the DataStream/SensorStream XES extension for representing IoT-data in event logs. Finally, we present some preliminary analysis and properties of the log.

  12. Data from: Multi-Source Distributed System Data for AI-powered Analytics

    • zenodo.org
    zip
    Updated Nov 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao; Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao (2022). Multi-Source Distributed System Data for AI-powered Analytics [Dataset]. http://doi.org/10.5281/zenodo.3549604
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 10, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao; Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract:

    In recent years there has been an increased interest in Artificial Intelligence for IT Operations (AIOps). This field utilizes monitoring data from IT systems, big data platforms, and machine learning to automate various operations and maintenance (O&M) tasks for distributed systems.
    The major contributions have been materialized in the form of novel algorithms.
    Typically, researchers took the challenge of exploring one specific type of observability data sources, such as application logs, metrics, and distributed traces, to create new algorithms.
    Nonetheless, due to the low signal-to-noise ratio of monitoring data, there is a consensus that only the analysis of multi-source monitoring data will enable the development of useful algorithms that have better performance.
    Unfortunately, existing datasets usually contain only a single source of data, often logs or metrics. This limits the possibilities for greater advances in AIOps research.
    Thus, we generated high-quality multi-source data composed of distributed traces, application logs, and metrics from a complex distributed system. This paper provides detailed descriptions of the experiment, statistics of the data, and identifies how such data can be analyzed to support O&M tasks such as anomaly detection, root cause analysis, and remediation.

    General Information:

    This repository contains the simple scripts for data statistics, and link to the multi-source distributed system dataset.

    You may find details of this dataset from the original paper:

    Sasho Nedelkoski, Jasmin Bogatinovski, Ajay Kumar Mandapati, Soeren Becker, Jorge Cardoso, Odej Kao, "Multi-Source Distributed System Data for AI-powered Analytics".

    If you use the data, implementation, or any details of the paper, please cite!

    BIBTEX:

    _

    @inproceedings{nedelkoski2020multi,
     title={Multi-source Distributed System Data for AI-Powered Analytics},
     author={Nedelkoski, Sasho and Bogatinovski, Jasmin and Mandapati, Ajay Kumar and Becker, Soeren and Cardoso, Jorge and Kao, Odej},
     booktitle={European Conference on Service-Oriented and Cloud Computing},
     pages={161--176},
     year={2020},
     organization={Springer}
    }
    

    _

    The multi-source/multimodal dataset is composed of distributed traces, application logs, and metrics produced from running a complex distributed system (Openstack). In addition, we also provide the workload and fault scripts together with the Rally report which can serve as ground truth. We provide two datasets, which differ on how the workload is executed. The sequential_data is generated via executing workload of sequential user requests. The concurrent_data is generated via executing workload of concurrent user requests.

    The raw logs in both datasets contain the same files. If the user wants the logs filetered by time with respect to the two datasets, should refer to the timestamps at the metrics (they provide the time window). In addition, we suggest to use the provided aggregated time ranged logs for both datasets in CSV format.

    Important: The logs and the metrics are synchronized with respect time and they are both recorded on CEST (central european standard time). The traces are on UTC (Coordinated Universal Time -2 hours). They should be synchronized if the user develops multimodal methods. Please read the IMPORTANT_experiment_start_end.txt file before working with the data.

    Our GitHub repository with the code for the workloads and scripts for basic analysis can be found at: https://github.com/SashoNedelkoski/multi-source-observability-dataset/

  13. Financial Data Analysis Process

    • figshare.com
    xml
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CONG LIU (2023). Financial Data Analysis Process [Dataset]. http://doi.org/10.6084/m9.figshare.23488436.v2
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    CONG LIU
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    · Financial expenses1 dataset: This dataset consists of simulated event logs generated from the financial expense data analysis process model. Each trace provides a detailed description of the process of analyzing office expense data. · Financial expenses2 dataset: This dataset consists of simulated event logs generated from the travel expense data analysis process model. Each trace provides a detailed description of the process of analyzing travel expense data. · Financial expenses3 dataset: This dataset consists of simulated event logs generated from the sales expense data analysis process model. Each trace provides a detailed description of the process of analyzing sales expense data. · Financial expenses4 dataset: This dataset consists of simulated event logs generated from the management expense data analysis process model. Each trace provides a detailed description of the process of analyzing management expense data. · Financial expenses5 dataset: This dataset consists of simulated event logs generated from the manufacturing expense data analysis process model. Each trace provides a detailed description of the process of analyzing manufacturing expense data. · Financial expenses6 dataset: This dataset consists of simulated event logs generated from the financial statement data analysis process model. Each trace provides a detailed description of the process of analyzing financial statement data.

  14. Security Monitoring and User Management Dataset

    • kaggle.com
    zip
    Updated Nov 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rasika Ekanayaka @ devLK (2024). Security Monitoring and User Management Dataset [Dataset]. https://www.kaggle.com/datasets/rasikaekanayakadevlk/security-monitoring-and-user-management-dataset
    Explore at:
    zip(2906638 bytes)Available download formats
    Dataset updated
    Nov 23, 2024
    Authors
    Rasika Ekanayaka @ devLK
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset consolidates data from multiple sources to provide a comprehensive view of security anomalies, insider threats, system updates, and user management. It includes information such as user behavior patterns, anomaly detection metrics, system update details, and user contact information. Designed for multi-dimensional analysis, the dataset is ideal for tasks like anomaly detection, insider threat assessment, system update tracking, and user data management in cybersecurity applications. Each record is enriched with timestamps and other relevant attributes to enable dynamic analysis and decision-making.

  15. 4

    Production Analysis with Process Mining Technology

    • data.4tu.nl
    • figshare.com
    zip
    Updated Jan 28, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dafna Levy (2014). Production Analysis with Process Mining Technology [Dataset]. http://doi.org/10.4121/uuid:68726926-5ac5-4fab-b873-ee76ea412399
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 28, 2014
    Dataset provided by
    NooL - Integrating People & Solutions
    Authors
    Dafna Levy
    License

    https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use

    Description

    The comma separated value dataset contains process data from a production process, including data on cases, activities, resources, timestamps and more data fields.

  16. Synthetic Log Data of Distributed System

    • kaggle.com
    zip
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shubham Patil (2023). Synthetic Log Data of Distributed System [Dataset]. https://www.kaggle.com/datasets/shubhampatil1999/synthetic-log-data-of-distributed-system/versions/1
    Explore at:
    zip(1909074 bytes)Available download formats
    Dataset updated
    Nov 22, 2023
    Authors
    Shubham Patil
    Description

    This dataset captures logs from a distributed system, providing a comprehensive view of system behavior and performance. The logs encompass a range of activities, including system events, errors, and performance metrics, offering valuable insights for understanding and optimizing distributed system architectures.

    Content: 1. File Format: CSV 2. Column Description: - Timestamp : Records the date and time of each logged event in the format [2023-11-20T08:40:50.664842], providing a chronological sequence for system activities. - LogLevel : Indicates the severity or importance of the logged event, classifying entries into levels such as INFO, WARNING, ERROR, or FATAL, providing insights into the significance of system occurrences. - Service : Specifies the name or identifier of the service associated with each log entry, facilitating the categorization and analysis of events based on the distributed system's modular components. - Message : Contains descriptive information or details related to the logged event, offering insights into the nature and context of the distributed system activity. - RequestID : Uniquely identifies each request, enabling traceability and correlation of log entries associated with specific transactions or operations in the distributed system. - User : Represents the user associated with the logged event, providing information about the entity interacting with the distributed system and aiding in user-centric analysis. - ClientIP : Uniquely identifies the client or application associated with the logged event, facilitating tracking and analysis of activities performed by different clients in the distributed system. - TimeTaken : Records the duration, in milliseconds or another specified unit, indicating the time taken to complete the corresponding operation or transaction in the distributed system.

    Key Features: 2. Error Analysis: Logs capture error messages and exceptions, facilitating the identification and resolution of issues within the distributed system. 3. Performance Metrics: Explore performance-related metrics to assess system health, response times, and resource utilization. 4. Temporal Patterns: Analyze temporal patterns and trends to understand system behavior over time.

    Potential Use Cases : 1. Anomaly Detection: Leverage the dataset for anomaly detection algorithms to identify unusual patterns or behaviors. 2. Performance Optimization: Use performance metrics to optimize resource allocation and improve overall system efficiency. 3. Predictive Maintenance: Anticipate potential issues by analyzing historical logs, enabling proactive system maintenance.

  17. Z

    Data from: Traffic and Log Data Captured During a Cyber Defense Exercise

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Jun 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Tovarňák; Stanislav Špaček; Jan Vykopal (2020). Traffic and Log Data Captured During a Cyber Defense Exercise [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3746128
    Explore at:
    Dataset updated
    Jun 12, 2020
    Dataset provided by
    Masarykova Univerzita, Brno, CZ
    Authors
    Daniel Tovarňák; Stanislav Špaček; Jan Vykopal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was acquired during Cyber Czech – a hands-on cyber defense exercise (Red Team/Blue Team) held in March 2019 at Masaryk University, Brno, Czech Republic. Network traffic flows and a high variety of event logs were captured in an exercise network deployed in the KYPO Cyber Range Platform.

    Contents

    The dataset covers two distinct time intervals, which correspond to the official schedule of the exercise. The timestamps provided below are in the ISO 8601 date format.

    Day 1, March 19, 2019

    Start: 2019-03-19T11:00:00.000000+01:00

    End: 2019-03-19T18:00:00.000000+01:00

    Day 2, March 20, 2019

    Start: 2019-03-20T08:00:00.000000+01:00

    End: 2019-03-20T15:30:00.000000+01:00

    The captured and collected data were normalized into three distinct event types and they are stored as structured JSON. The data are sorted by a timestamp, which represents the time they were observed. Each event type includes a raw payload ready for further processing and analysis. The description of the respective event types and the corresponding data files follows.

    cz.muni.csirt.IpfixEntry.tgz – an archive of IPFIX traffic flows enriched with an additional payload of parsed application protocols in raw JSON.

    cz.muni.csirt.SyslogEntry.tgz – an archive of Linux Syslog entries with the payload of corresponding text-based log messages.

    cz.muni.csirt.WinlogEntry.tgz – an archive of Windows Event Log entries with the payload of original events in raw XML.

    Each archive listed above includes a directory of the same name with the following four files, ready to be processed.

    data.json.gz – the actual data entries in a single gzipped JSON file.

    dictionary.yml – data dictionary for the entries.

    schema.ddl – data schema for Apache Spark analytics engine.

    schema.jsch – JSON schema for the entries.

    Finally, the exercise network topology is described in a machine-readable NetJSON format and it is a part of a set of auxiliary files archive – auxiliary-material.tgz – which includes the following.

    global-gateway-config.json – the network configuration of the global gateway in the NetJSON format.

    global-gateway-routing.json – the routing configuration of the global gateway in the NetJSON format.

    redteam-attack-schedule.{csv,odt} – the schedule of the Red Team attacks in CSV and ODT format. Source for Table 2.

    redteam-reserved-ip-ranges.{csv,odt} – the list of IP segments reserved for the Red Team in CSV and ODT format. Source for Table 1.

    topology.{json,pdf,png} – the topology of the complete Cyber Czech exercise network in the NetJSON, PDF and PNG format.

    topology-small.{pdf,png} – simplified topology in the PDF and PNG format. Source for Figure 1.

  18. Cloud-based User Entity Behavior Analytics Log Data Set

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Landauer; Florian Skopik; Georg Höld; Markus Wurzenberger; Max Landauer; Florian Skopik; Georg Höld; Markus Wurzenberger (2023). Cloud-based User Entity Behavior Analytics Log Data Set [Dataset]. http://doi.org/10.5281/zenodo.7119953
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 30, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Max Landauer; Florian Skopik; Georg Höld; Markus Wurzenberger; Max Landauer; Florian Skopik; Georg Höld; Markus Wurzenberger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This respository contains the CLUE-LDS (CLoud-based User Entity behavior analytics Log Data Set). The data set contains log events from real users utilizing a cloud storage suitable for User Entity Behavior Analytics (UEBA). Events include logins, file accesses, link shares, config changes, etc. The data set contains around 50 million events generated by more than 5000 distinct users in more than five years (2017-07-07 to 2022-09-29 or 1910 days). The data set is complete except for 109 events missing on 2021-04-22, 2021-08-20, and 2021-09-05 due to database failure. The unpacked file size is around 14.5 GB. A detailed analysis of the data set is provided in [1].

    The logs are provided in JSON format with the following attributes in the first level:

    • id: Unique log line identifier that starts at 1 and increases incrementally, e.g., 1.
    • time: Time stamp of the event in ISO format, e.g., 2021-01-01T00:00:02Z.
    • uid: Unique anonymized identifier for the user generating the event, e.g., old-pink-crane-sharedealer.
    • uidType: Specifier for uid, which is either the user name or IP address for logged out users.
    • type: The action carried out by the user, e.g., file_accessed.
    • params: Additional event parameters (e.g., paths, groups) stored in a nested dictionary.
    • isLocalIP: Optional flag for event origin, which is either internal (true) or external (false).
    • role: Optional user role: consulting, administration, management, sales, technical, or external.
    • location: Optional IP-based geolocation of event origin, including city, country, longitude, latitude, etc.

    In the following data sample, the first object depicts a successful user login (see type: login_successful) and the second object depicts a file access (see type: file_accessed) from a remote location:

    {"params": {"user": "intact-gray-marlin-trademarkagent"}, "type": "login_successful", "time": "2019-11-14T11:26:43Z", "uid": "intact-gray-marlin-trademarkagent", "id": 21567530, "uidType": "name"}

    {"isLocalIP": false, "params": {"path": "/proud-copper-orangutan-artexer/doubtful-plum-ptarmigan-merchant/insufficient-amaranth-earthworm-qualitycontroller/curious-silver-galliform-tradingstandards/incredible-indigo-octopus-printfinisher/wicked-bronze-sloth-claimsmanager/frantic-aquamarine-horse-cleric"}, "type": "file_accessed", "time": "2019-11-14T11:26:51Z", "uid": "graceful-olive-spoonbill-careersofficer", "id": 21567531, "location": {"countryCode": "AT", "countryName": "Austria", "region": "4", "city": "Gmunden", "latitude": 47.915, "longitude": 13.7959, "timezone": "Europe/Vienna", "postalCode": "4810", "metroCode": null, "regionName": "Upper Austria", "isInEuropeanUnion": true, "continent": "Europe", "accuracyRadius": 50}, "uidType": "ipaddress"}

    The data set was generated at the premises of Huemer Group, a midsize IT service provider located in Vienna, Austria. Huemer Group offers a range of Infrastructure-as-a-Service solutions for enterprises, including cloud computing and storage. In particular, their cloud storage solution called hBOX enables customers to upload their data, synchronize them with multiple devices, share files with others, create versions and backups of their documents, collaborate with team members in shared data spaces, and query the stored documents using search terms. The hBOX extends the open-source project Nextcloud with interfaces and functionalities tailored to the requirements of customers.

    The data set comprises only normal user behavior, but can be used to evaluate anomaly detection approaches by simulating account hijacking. We provide an implementation for identifying similar users, switching pairs of users to simulate changes of behavior patterns, and a sample detection approach in our github repo.

    Acknowledgements: Partially funded by the FFG project DECEPT (873980). The authors thank Walter Huemer, Oskar Kruschitz, Kevin Truckenthanner, and Christian Aigner from Huemer Group for supporting the collection of the data set.

    If you use the dataset, please cite the following publication:

    [1] M. Landauer, F. Skopik, G. Höld, and M. Wurzenberger. "A User and Entity Behavior Analytics Log Data Set for Anomaly Detection in Cloud Computing". 2022 IEEE International Conference on Big Data - 6th International Workshop on Big Data Analytics for Cyber Intelligence and Defense (BDA4CID 2022), December 17-20, 2022, Osaka, Japan. IEEE. [PDF]

  19. Metadata record for: Multivariate time series dataset for space weather data...

    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scientific Data Curation Team (2023). Metadata record for: Multivariate time series dataset for space weather data analytics [Dataset]. http://doi.org/10.6084/m9.figshare.12444884.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Scientific Data Curation Team
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains key characteristics about the data described in the Data Descriptor Multivariate time series dataset for space weather data analytics. Contents:

        1. human readable metadata summary table in CSV format
    
    
        2. machine readable metadata file in JSON format
    
  20. m

    Systems monitoring platform integrating artificial intelligence for incident...

    • data.mendeley.com
    Updated Jan 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bruno Hiroshi Espinosa Luna (2025). Systems monitoring platform integrating artificial intelligence for incident response in servers - dataset [Dataset]. http://doi.org/10.17632/md7x42rcbm.1
    Explore at:
    Dataset updated
    Jan 22, 2025
    Authors
    Bruno Hiroshi Espinosa Luna
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the results of 311 tests conducted to evaluate a server monitoring system that integrates artificial intelligence for incident detection and analysis. The system uses tools such as Grafana and Prometheus for metric collection, Grafana Loki for log management, and the OpenAI API for log analysis. The dataset includes metrics on CPU usage, memory, storage, and service logs, as well as response times for alerts sent via Telegram and the GPT model's analysis.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ziya (2025). Student Learning Interaction Logs Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/student-learning-interaction-logs-dataset
Organization logo

Student Learning Interaction Logs Dataset

Engagement, performance & behavior data for adaptive learning systems

Explore at:
zip(425369 bytes)Available download formats
Dataset updated
Aug 8, 2025
Authors
Ziya
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

📝 Description: This dataset captures simulated student interactions in a digital learning environment. Each row represents a unique learning session, containing comprehensive information about student behavior, engagement, performance, and progression over time.

The dataset is designed to support research and development in personalized education, adaptive learning systems, student engagement analysis, and feedback optimization. It enables the study of learning patterns and offers insights into how students interact with digital content, how they perform in assessments, and how their learning behavior evolves across sessions.

🌟 Key Features: Feature Description student_id Unique identifier for each student session_id Unique ID for each learning session timestamp Date and time of the session module_id Course/module accessed during the session time_spent_minutes Time spent in the session (in minutes) pages_visited Number of content pages visited video_watched_percent Percentage of video watched during the session click_events Number of interactions (clicks, navigations, etc.) notes_taken Whether the student took notes (1 = yes, 0 = no) forum_posts Number of forum posts/comments made revisit_flag Indicates if content was revisited quiz_score Score obtained in the session quiz (0–100) attempts_taken Number of quiz attempts made assignment_score Score in the session’s assignment (0–100) feedback_rating Student’s feedback rating for the session (1–5) days_since_last_activity Number of days since last session cumulative_quiz_score Running total of all previous quiz scores learning_trend Average performance across sessions attention_score Derived indicator of engagement during the session feedback_type Type of feedback given (e.g., revise topic, pace slow) next_module_prediction Suggested next module for the student success_label Indicator of learning success (1 = successful, 0 = not)

📊 Dataset Overview: Total Records: 9,000+ learning sessions

Total Students: 300

Total Features: 22

Data Format: CSV

Time-Series Ready: Yes (sequential session data per student)

💡 Use Cases: Analyze and visualize student learning patterns

Evaluate content engagement and session behavior

Develop personalized learning dashboards or analytics tools

Simulate adaptive feedback systems for digital education

Search
Clear search
Close search
Google apps
Main menu