100+ datasets found

u
Public benchmark dataset for Conformance Checking in Process Mining
figshare.unimelb.edu.au
melbourne.figshare.com
xml
Updated Jan 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Reissner (2022). Public benchmark dataset for Conformance Checking in Process Mining [Dataset]. http://doi.org/10.26188/5cd91d0d3adaa
Explore at:
xmlAvailable download formats
Unique identifier
https://doi.org/10.26188/5cd91d0d3adaa
Dataset updated
Jan 30, 2022
Dataset provided by
The University of Melbourne
Authors
Daniel Reissner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a variety of publicly available real-life event logs. We derived two types of Petri nets for each event log with two state-of-the-art process miners : Inductive Miner (IM) and Split Miner (SM). Each event log-Petri net pair is intended for evaluating the scalability of existing conformance checking techniques.We used this data-set to evaluate the scalability of the S-Component approach for measuring fitness. The dataset contains tables of descriptive statistics of both process models and event logs. In addition, this dataset includes the results in terms of time performance measured in milliseconds for several approaches for both multi-threaded and single-threaded executions. Last, the dataset contains a cost-comparison of different approaches and reports on the degree of over-approximation of the S-Components approach. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/1910.09767. Update:The dataset has been extended with the event logs of the BPIC18 and BPIC19 logs. BPIC19 is actually a collection of four different processes and thus was split into four event logs. For each of the additional five event logs, again, two process models have been mined with inductive and split miner. We used the extended dataset to test the scalability of our tandem repeats approach for measuring fitness. The dataset now contains updated tables of log and model statistics as well as tables of the conducted experiments measuring execution time and raw fitness cost of various fitness approaches. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/2004.01781.Update: The dataset has also been used to measure the scalability of a new Generalization measure based on concurrent and repetitive patterns. : A concurrency oracle is used in tandem with partial orders to identify concurrent patterns in the log that are tested against parallel blocks in the process model. Tandem repeats are used with various trace reduction and extensions to define repetitive patterns in the log that are tested against loops in the process model. Each pattern is assigned a partial fulfillment. The generalization is then the average of pattern fulfillments weighted by the trace counts for which the patterns have been observed. The dataset no includes the time results and a breakdown of Generalization values for the dataset.
4
Production Analysis with Process Mining Technology
data.4tu.nl
zip
Updated Jan 28, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dafna Levy (2014). Production Analysis with Process Mining Technology [Dataset]. http://doi.org/10.4121/uuid:68726926-5ac5-4fab-b873-ee76ea412399
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:68726926-5ac5-4fab-b873-ee76ea412399
Dataset updated
Jan 28, 2014
Dataset provided by
NooL - Integrating People & Solutions
Authors
Dafna Levy
License
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
Description
The comma separated value dataset contains process data from a production process, including data on cases, activities, resources, timestamps and more data fields.
T
CeLOE event log sample
dataverse.telkomuniversity.ac.id
tsv
Updated Apr 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Telkom University Dataverse (2022). CeLOE event log sample [Dataset]. http://doi.org/10.34820/FK2/9FT77M
Explore at:
tsv(10066), tsv(19847)Available download formats
Unique identifier
https://doi.org/10.34820/FK2/9FT77M
Dataset updated
Apr 20, 2022
Dataset provided by
Telkom University Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This study analyses an event log, automatically generated by the CeLOE LMS, that records student and lecturer activities in learning. The event log is mined to obtain a process model representing learning behaviours of the lecturers and students during the learning process. The case study in this research is learning in the study program 365 during the first semester of 2020/2021.
Z
Data from: An IoT-Enriched Event Log for Process Mining in Smart Factories
data.niaid.nih.gov
Updated Jun 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Malburg, Lukas (2024). An IoT-Enriched Event Log for Process Mining in Smart Factories [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7795547
Explore at:
Dataset updated
Jun 10, 2024
Dataset provided by
Grüger, Joscha
Malburg, Lukas
Bergmann, Ralph
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DEPRECATED - current version: https://figshare.com/articles/dataset/Dataset_An_IoT-Enriched_Event_Log_for_Process_Mining_in_Smart_Factories/20130794

Modern technologies such as the Internet of Things (IoT) are becoming increasingly important in various domains, including Business Process Management (BPM) research. One main research area in BPM is process mining, which can be used to analyze event logs, e.g., for checking the conformance of running processes. However, there are only a few IoT-based event logs available for research purposes. Some of them are artificially generated, and the problem occurs that they do not always completely reflect the actual physical properties of smart environments. In this paper, we present an IoT-enriched XES event log that is generated by a physical smart factory. For this purpose, we created the DataStream XES extension for representing IoT-data in event logs. Finally, we present some preliminary analysis and properties of the log.
Z
Process Models obtained from event logs with with different...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sander J.J. Leemans (2020). Process Models obtained from event logs with with different information-preserving abstractions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3243987
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Sander J.J. Leemans
Dirk Fahland
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains results of the experiment to analyze information preservation and recovery by different event log abstractions in process mining described in: Sander J.J. Leemans, Dirk Fahland "Information-Preserving Abstractions of Event Data in Process Mining" Knowledge and Information Systems, ISSN: 0219-1377 (Print) 0219-3116 (Online), accepted May 2019

The experiment results were obtained with: https://doi.org/10.5281/zenodo.3243981
f
JUnit 4.12 Software Event Log
figshare.com
data.4tu.nl
txt
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maikel Leemans (2023). JUnit 4.12 Software Event Log [Dataset]. http://doi.org/10.4121/uuid:cfed8007-91c8-4b12-98d8-f233e5cd25bb
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:cfed8007-91c8-4b12-98d8-f233e5cd25bb
Dataset updated
Jun 4, 2023
Dataset provided by
4TU.ResearchData
Authors
Maikel Leemans
License
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
Description
XES software event log obtained through instrumenting JUnit 4.12 using the tool available at {https://svn.win.tue.nl/repos/prom/XPort/}. This event log contains method-call level events describing a single run of the JUnit 4.12 software, available at {https://mvnrepository.com/artifact/junit/junit/4.12} , using the input from {https://github.com/junit-team/junit4/wiki/Getting-started}. Note that the life-cycle information in this log corresponds to method call (start) and return (complete), and captures a method-call hierarchy.
(Un)Fair Process Mining Event Logs (Converted to OCEL)
zenodo.org
bin, xml
Updated Nov 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessandro Berti; Alessandro Berti (2024). (Un)Fair Process Mining Event Logs (Converted to OCEL) [Dataset]. http://doi.org/10.5281/zenodo.14043725
Explore at:
bin, xmlAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14043725
Dataset updated
Nov 6, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alessandro Berti; Alessandro Berti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Converted to OCEL 1.0 JSONOCEL and OCEL 2.0 XML from traditional event logs available at: Zenodo - Record 8059489.

Object Types: { Person }

Person-level Attributes:

(int) overallProtected: An attribute (0/1) indicating whether the person has experienced discrimination. (Note: If you're developing a fairness assessment algorithm, only use this attribute in the testing phase!)

(int) sumBoolDiscrFactors: Counts the number of possible discrimination factors that apply to the person.

(int) reworkedActivities: The total amount of rework involved in the person’s processing.

(float) throughputTime: The total processing time for a person.

(int) numOcc_ACTIVITY: Counts the number of times an activity occurs in the person’s lifecycle.

Event-level Attributes:

resource: The resource involved in processing a given person.

* Hiring

The data describes a multifaceted recruitment process with diverse application pathways ranging from minimal processing to extensive multi-step procedures. The variability of these routes, largely dependent on numerous determinants, yields a spectrum of outcomes from instant rejection to successful job offers.

The logs include attributes such as age, citizenship, German proficiency, gender, religion, and years of education. While these attributes may inform candidate profiles, their misuse could engender discrimination. Variables like age and education may signify experience and skills, citizenship and German language may address job logistics, but these should not unjustly eliminate applicants. Gender and religion, unrelated to job performance, must not sway hiring. Therefore, the use of these attributes must uphold fairness, avoiding any potential bias.

* Hospital

The data depicts a hospital treatment process that commences with registration at an Emergency Room or Family Department and advances through stages of examination, diagnosis, and treatment. Notably, unsuccessful treatments often entail repetitive diagnostic and treatment cycles, underscoring the iterative nature of healthcare provision.

The logs incorporate patient attributes such as age, underlying condition, citizenship, German language proficiency, gender, and private insurance. These attributes, influencing the treatment process, may unveil potential discrimination. Factors like age and condition might affect case complexity and treatment path, while citizenship may highlight healthcare access disparities. German proficiency can impact provider-patient communication, thus affecting care quality. Gender could spotlight potential health disparities, while insurance status might indicate socio-economic influences on care quality or timeliness. Therefore, a comprehensive examination of these attributes vis-a-vis the treatment process could shed light on potential biases or disparities, fostering fairness in healthcare delivery.

* Lending

This data illustrates the steps within a loan application process. From an initial appointment request, the process navigates various stages, including information verification and underwriting, culminating in loan approval or denial. Additional steps may be required, such as co-signer enlistment or collateral assessment. Some cases experience outright appointment denial, indicating the process's variability, reflecting applicants' differing credit situations.

The logs' attributes can aid in identifying influences on outcomes and detecting discrimination. Personal characteristics ('age', 'citizen', 'German speaking', and 'gender') and socio-economic indicators ('YearsOfEducation' and 'CreditScore') can impact the process. While 'yearsOfEducation' and 'CreditScore' can validly inform creditworthiness, 'age', 'citizen', 'language ability', and 'gender' should not bias loan decisions, ensuring these attributes are used responsibly fosters equitable loan processes.

* Renting

The data represents a rental process. It begins with a prospective tenant applying to view a property. Subsequent steps include an initial screening phase, viewing, decision-making, and a potential extensive screening. The process ends with the acceptance or rejection of the prospective tenant. In some cases, a tenant may apply for viewing but be rejected without the viewing occurring.

The logs contain attributes that can shed light on potential biases in the process. 'Age', 'citizen', 'German speaking', 'gender', 'religious affiliation', and 'yearsOfEducation' might influence the rental process, leading to potential discrimination. While some attributes may provide useful insights into a potential tenant's reliability, misuse could result in discrimination. Thus, fairness must be observed in utilizing these attributes to avoid potential biases and ensure equitable treatment.
f
Real-life event logs - Hospital log
figshare.com
txt
Updated Jul 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boudewijn van Dongen (2020). Real-life event logs - Hospital log [Dataset]. http://doi.org/10.4121/uuid:d9769f3d-0ab0-4fb8-803b-0d1120ffcf54
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:d9769f3d-0ab0-4fb8-803b-0d1120ffcf54
Dataset updated
Jul 25, 2020
Dataset provided by
4TU.ResearchData
Authors
Boudewijn van Dongen
License
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
Description
Real life log of a Dutch academic hospital, originally intended for use in the first Business Process Intelligence Contest (BPIC 2011)
Z
Simplified Event Logs for Sepsis Patient Trajectories
data.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghahremani, Sona (2024). Simplified Event Logs for Sepsis Patient Trajectories [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3989589
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Sakizloglou, Lucas
Ghahremani, Sona
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a simplified excerpt from a real event log that tracks the trajectories of patients admitted to a hospital to be treated for sepsis, a life-threatening condition. The log has been recorded by the Enterprise Resource Planning of the hospital. Additionally, the dataset contains three synthetic logs that increase the number of trajectories within the original log timespan, while maintaining other statistical characteristics.

In total, the dataset contains four files in .zip format and a companion that describes the statistical method used to synthesize the logs as well as the dataset content in detail. The dataset can be used in testing the performance of event-based process-mining and log (runtime) monitoring tools against an increasing load of events.
f
Validation of Precision Measures - Event Logs and Process Models
figshare.com
data.4tu.nl
zip
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niek Tax (2023). Validation of Precision Measures - Event Logs and Process Models [Dataset]. http://doi.org/10.4121/uuid:991753f7-a240-4ba6-a8a8-67174a08c51b
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:991753f7-a240-4ba6-a8a8-67174a08c51b
Dataset updated
Jun 8, 2023
Dataset provided by
4TU.ResearchData
Authors
Niek Tax
License
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
Description
This collection contains the event logs and process models described and used in the paper "The Imprecisions of Precision Measures in Process Mining"
u
Process Mining-Based Goal Recognition System Evaluation Dataset
figshare.unimelb.edu.au
application/bzip2
Updated Aug 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zihang Su (2023). Process Mining-Based Goal Recognition System Evaluation Dataset [Dataset]. http://doi.org/10.26188/21749570.v4
Explore at:
application/bzip2Available download formats
Unique identifier
https://doi.org/10.26188/21749570.v4
Dataset updated
Aug 11, 2023
Dataset provided by
The University of Melbourne
Authors
Zihang Su
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
These datasets are used for evaluating the process mining-based goal recognition system proposed in the paper "Fast and Accurate Data-Driven Goal Recognition Using Process Mining Techniques." The datasets include a running example, an evaluation dataset for synthetic domains, and real-world business logs.running_example.tar.bz contains the traces shown in figure 2 of the paper for learning six skill models toward six goal candidates and the three walks shown in figure 1.a.synthetic_domains.tar.bz2 is the dataset for evaluating GR system in synthetic domains (IPC domains). There are two types of traces used for learning skill models, generated by the top-k planner and generated by the diverse planner. Please extract the archived domains located in topk/ and diverse/. In each domain, the sub-folder problems/ contains the dataset for learning skill models, and the sub-folder test/ contains the traces (plans) for testing the GR performance. There are five levels of observations, 10%, 30%, 50%, 70%, and 100%. For each level of observation, there are multiple problem instances, the instance ID starts from 0. A problem instance contains the synthetic domain model (PDDL files), training traces (in train/), and an observation for testing (obs.dat). The top-k and diverse planners for generating traces can be accessed here. The original PDDL models of the problem instances for the 15 IPC domains mentioned in the paper are available here.business_logs.tar.bz is the dataset for evaluating GR system in real-world domains. There are two types of problem instances: one with only two goal candidates (yes or no), referred to as "binary," and the other containing multiple goal candidates, termed "multiple." Please extract the archived files located in the directories binary/ and multiple/. The traces for learning the skill models can be found in XES files, and the traces (plans) for testing can be found in the directory goal*/.
UIS Log: Synthetic User Interface with Screenshots Log
zenodo.org
zip
Updated Sep 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonio Martínez-Rojas; Antonio Martínez-Rojas; Andrés Jiménez-Ramírez; Andrés Jiménez-Ramírez; José González Enríquez; José González Enríquez; Hajo A. Reijers; Hajo A. Reijers (2024). UIS Log: Synthetic User Interface with Screenshots Log [Dataset]. http://doi.org/10.5281/zenodo.5734323
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5734323
Dataset updated
Sep 9, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Antonio Martínez-Rojas; Antonio Martínez-Rojas; Andrés Jiménez-Ramírez; Andrés Jiménez-Ramírez; José González Enríquez; José González Enríquez; Hajo A. Reijers; Hajo A. Reijers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These data correspond to the set of problems for evaluating the proposal detailed in Martínez-Rojas et al. 2022. The evaluation utilizes a set of synthetic problems that simulate realistic administrative use cases. Each problem includes a UI Log with a synthetic screenshot corresponding to each event, capturing 3 distinct processes (P) marked by varying complexity levels. These levels are defined by the number of activities, the process execution variants, and the visual features influencing decisions between these variants.

The implementation of this proposal can be found in the tool available at this GitHub repository, which utilizes the logs of these 3 processes for validation. Here they are described:

P1 Client creation. A process with 5 activities and 2 variants. The single decision in this process is made based on the existence of an attachment in the reception email.

P2 Client validation. A process with 7 activities and 2 variants. The decision is made based on the user’s response to a query.

P3 Client deletion. A process with 7 activities and 4 variants. The decisions are made based on two conditions: (1) the existence of pending invoices and (2) the existence of an attachment to justify the payment of the invoices.

These processes all contain a single decision point, although the one in P3 is complex. All processes include

synthetic screen captures for their activities and

a sample event log with a single instance for each variant.

To generate the objects for the valuation, we generate event logs of different sizes (|L|) for each of these processes by deriving events from the sample event log. We consider log sizes in the range of {10, 25, 50, 100} events. Note that we consider complete instances in the log and thus, we remove the last instance if it goes beyond |L|.
Some of these logs are generated with a balanced number of instances, while others are unbalanced (B?) which present more than 20% of different frequency between the most frequent and less frequent variants. To average the result over a collection of problems, 30 instances are randomly generated for each tuple < P, |L|, B? >.
In this dataset there are 3 zips, one for each family. Each family corresponds to a process:

Basic corresponds to P1

Intermediate corresponds to P12

Advanced corresponds to P3

Within these folders, we find 30 different scenarios (folder), in which the look and feel of the applications present in the screenshots have suffered little variations. Within each of these scenarios, variations are carried out respecting to the data entered in the forms and the images or attachments present in the user interface to generate log instances depending on the characteristics of each process.
For each scenario, we find 8 folders with the concrete problem which is defined by Log_size (in {10,25,50,100}) and Balanced (in {Balanced, Unbalanced}). The name of these folders have this format: Family_LogSize_Balanced.
Inside each problem folder the UI Log and the screen captures can be found.

References

Martínez-Rojas, A., Jiménez-Ramírez, A., Enríquez, J. G., & Reijers, H. A. (2022, September). Analyzing variable human actions for robotic process automation. In International Conference on Business Process Management (pp. 75-90). Cham: Springer International Publishing.
Z
A Collection of Event Logs of Blockchain-based Applications
data.niaid.nih.gov
Updated Jan 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alzhrani, Fouzia (2023). A Collection of Event Logs of Blockchain-based Applications [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6637058
Explore at:
Dataset updated
Jan 29, 2023
Dataset authored and provided by
Alzhrani, Fouzia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A set of event logs of 101 blockchain-based applications (DApps). For each DApp, there are two event log files. The first one is a raw version where data is encoded by blockchain. The second file is a decoded version where data is decoded into a human-readable format. If a DApp has multiple versions on different blockchain networks, then there are two event log files (encoded and decoded) for each version. In addition, the event registry file includes a comprehensive list of event names and their corresponding signatures obtained from contract ABIs of the 101 DApps.
Data from: Differentially Private Release of Event Logs for Process Mining
zenodo.org
application/gzip
Updated Oct 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gamal Elkoumy; Gamal Elkoumy; Alisa Pankova; Marlon Dumas; Marlon Dumas; Alisa Pankova (2021). Differentially Private Release of Event Logs for Process Mining [Dataset]. http://doi.org/10.5281/zenodo.5599454
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5599454
Dataset updated
Oct 26, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gamal Elkoumy; Gamal Elkoumy; Alisa Pankova; Marlon Dumas; Marlon Dumas; Alisa Pankova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The anonymized event logs of the experiments in the paper "Differentially Private Release of Event Logs for Process Mining"
Z
Differentially Private Event Logs for Process Mining: Supplementary Material...
data.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gamal Elkoumy; Alisa Pankova; Marlon Dumas (2024). Differentially Private Event Logs for Process Mining: Supplementary Material [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4601138
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Cybernetica, Tartu, Estonia
University of Tartu, Tartu Estonia
Authors
Gamal Elkoumy; Alisa Pankova; Marlon Dumas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this archive, we provide supplementary material for our paper entitled "Mine Me but Don’t Single Me Out: DifferentiallyPrivate Event Logs for Process Mining". We list the selected event logs and their characteristics and descriptive statistics. Also, this archive contains the anonymized event logs as the result of the experiments. The source code is available on GitHub.
u
Correlation Data for Species-Coverage-based Log Representativeness and TLRA
figshare.unimelb.edu.au
xlsx
Updated Aug 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anandi Karunaratne; Artem Polyvyanyy (2025). Correlation Data for Species-Coverage-based Log Representativeness and TLRA [Dataset]. http://doi.org/10.26188/26410747.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.26188/26410747.v1
Dataset updated
Aug 4, 2025
Dataset provided by
The University of Melbourne
Authors
Anandi Karunaratne; Artem Polyvyanyy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the correlation data for species-coverage-based log representativeness measure and Trace-based Log Representativeness Approximation (TLRA) across event logs of 60 generative systems and varying log sizes and noise levels.
Statechart Workbench and Alignments Software Event Log
search.datacite.org
Updated Aug 31, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maikel Leemans (2018). Statechart Workbench and Alignments Software Event Log [Dataset]. http://doi.org/10.4121/uuid:7f787965-da13-4bb8-a3fd-242f08aef9c4
Explore at:
Unique identifier
https://doi.org/10.4121/uuid:7f787965-da13-4bb8-a3fd-242f08aef9c4
Dataset updated
Aug 31, 2018
Dataset provided by
DataCitehttps://www.datacite.org/
Eindhoven University of Technology
Authors
Maikel Leemans
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Extensible Event Stream (XES) software event log obtained through instrumenting the Statechart Workbench ProM plugin using the tool available at {https://svn.win.tue.nl/repos/prom/XPort/}. This event log contains method-call level events describing a workbench run invoking the Alignments algorithm using the BPI Challenge 2012 event log available and documented at {https://doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f} . Note that the life-cycle information in this log corresponds to method call (start) and return (complete), and captures a method-call hierarchy.
Event log with data attributes
figshare.com
xml
Updated Aug 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dina Bayomie (2022). Event log with data attributes [Dataset]. http://doi.org/10.6084/m9.figshare.20736706.v1
Explore at:
xmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20736706.v1
Dataset updated
Aug 31, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Dina Bayomie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These 60 event log varies over the number of cases and the density of the overlapping cases. The log has the following event attributes: event id, case id, activity, timestamp, loan type, amount, resources, and status. And the BPMN scenarios were used to simulate the process.

Container Logistics Object-centric Event Log

zenodo.org
data.niaid.nih.gov

bin, json, xml, zip

Updated Oct 10, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Benedikt Knopp; Benedikt Knopp; Nina Graves; Nina Graves (2023). Container Logistics Object-centric Event Log [Dataset]. http://doi.org/10.5281/zenodo.8428084

Explore at:

json, zip, xml, binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.8428084

Dataset updated

Oct 10, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Benedikt Knopp; Benedikt Knopp; Nina Graves; Nina Graves

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

General Description

Our company sells goods overseas. After receiving an order, the shipment of goods is scheduled. According to this schedule, the goods are picked up from the local production site and brought to a terminal where a logistics service provider receives and ships them.

This is an artificial event log according to the OCEL 2.0 Standard simulated using CPN-Tools. Both the CPN and the SQLite can be downloaded.

Process Overview

From a customer order perspective, the process begins when the order is registered at our company (register customer order). After registration, a transport document is created in which details of the further process are recorded (create transport document).

Using this information, the logistics service provider is contacted to coordinate the transport of the ordered goods to the seaport. Twice a week, that provider sends a vehicle to a terminal, with a limited capacity for containers of ordered goods to be transported from the terminal to a seaport. For our company, available capacties vary from vehicle to vehicle, as we are not the only company booking spots. Once the logistics service provider receives our transport documents, they book capacities according to availability and container prioritizations in the upcoming weeks (book vehicles). Once the dates for transporting the goods to the terminal are set, our company contacts a container depot to reserve the required containers (order empty containers).

When a container’s vehicle departure approaches, the goods are prepared, packed and shipped to the terminal. For this purpose, a truck is sent to the container depot (pick up empty container). Meanwhile, the ordered goods to be shipped are packed into handling units at the production site. After loading the handling units (load truck), the truck drives the full container to the terminal (drive to terminal).

At the terminal, the container is picked up by a free forklift and weighed (weigh). Unless the vehicle departure is imminent, the container is placed in the storage location at the terminal (place in stock). Finally, it is moved to the vehicle (bring to loading bay, load to vehicle) which departs at a fixed time (depart).

Despite careful planning, containers sometimes miss a vehicle’s departure. In this case, the container is rescheduled to the next possible vehicle (reschedule container) and kept near the loading ramp until then.

Further information can be found at: https://www.ocel-standard.org/beta/event-logs/simulations/logistics/

General Properties

An overview of log properties is given below.

Property	Value
Event Types	14
Object Types	7
Events	35761
Objects	14013

Control-Flow Behavior

The behavior of the log is described by a respective object-centric Petri net. Also, individual object types exhibit behavior that can be described by simpler Petri nets. See below.

Container	Transport Documents
Customer Order	Truck
Forklift	Vehicle
Handling Unit

Object Relationships

During the process, object-to-object relations can emerge at activity occurrences as follows.

Activity	Source Object Type	Target Object Type	Qualifier
Create Transport Document	Customer Order	Transport Document	TD for CO
Book Vehicle	Transport Document	Vehicle	Regular VH for TD
Book Vehicle	Transport Document	Vehicle	High-Prio VH for TD
Order Empty Containers	Transport Document	Container	CR for TD
Pick Empty Container	Truck	Container	TR loads CR
Load Truck	Container	Handling Unit	CR contains HU
Reschedule Container	Transport Document	Vehicle	Substitute VH for TD

Simulation Model

The CPN used to create this event log can also be downloaded.To obtain simulated data, extract the linked ZIP file and play out the CPN therein, e.g., by using CPN Tools.

The play-out produces CSV files according to the schema of OCEL2.0. This Python notebook can be used to convert these files to an SQLite dump.

For a technical documentation of the simulation model, please open the attached CPN with CPN Tools and see the annotations therein.

Acknowledgements

Funded under the Excellence Strategy of the Federal Government and the Länder. Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy - EXC-2023 Internet of Production - 390621612. We also thank the Alexander von Humboldt (AvH) Stiftung for supporting our research.

i
Event Logs and Process Models for Evaluating Discovery Algorithm Robustness...
ieee-dataport.org
Updated Oct 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anandi Karunaratne (2025). Event Logs and Process Models for Evaluating Discovery Algorithm Robustness under Noise [Dataset]. https://ieee-dataport.org/documents/event-logs-and-process-models-evaluating-discovery-algorithm-robustness-under-noise
Explore at:
Dataset updated
Oct 22, 2025
Authors
Anandi Karunaratne
Description
Heuristics Miner

Facebook

Twitter

Click to copy link

Link copied

Cite

Daniel Reissner (2022). Public benchmark dataset for Conformance Checking in Process Mining [Dataset]. http://doi.org/10.26188/5cd91d0d3adaa

Public benchmark dataset for Conformance Checking in Process Mining

Explore at:

5 scholarly articles cite this dataset (View in Google Scholar)

xmlAvailable download formats

Unique identifier

https://doi.org/10.26188/5cd91d0d3adaa

Dataset updated

Jan 30, 2022

Dataset provided by

The University of Melbourne

Authors

Daniel Reissner

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains a variety of publicly available real-life event logs. We derived two types of Petri nets for each event log with two state-of-the-art process miners : Inductive Miner (IM) and Split Miner (SM). Each event log-Petri net pair is intended for evaluating the scalability of existing conformance checking techniques.We used this data-set to evaluate the scalability of the S-Component approach for measuring fitness. The dataset contains tables of descriptive statistics of both process models and event logs. In addition, this dataset includes the results in terms of time performance measured in milliseconds for several approaches for both multi-threaded and single-threaded executions. Last, the dataset contains a cost-comparison of different approaches and reports on the degree of over-approximation of the S-Components approach. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/1910.09767. Update:The dataset has been extended with the event logs of the BPIC18 and BPIC19 logs. BPIC19 is actually a collection of four different processes and thus was split into four event logs. For each of the additional five event logs, again, two process models have been mined with inductive and split miner. We used the extended dataset to test the scalability of our tandem repeats approach for measuring fitness. The dataset now contains updated tables of log and model statistics as well as tables of the conducted experiments measuring execution time and raw fitness cost of various fitness approaches. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/2004.01781.Update: The dataset has also been used to measure the scalability of a new Generalization measure based on concurrent and repetitive patterns. : A concurrency oracle is used in tandem with partial orders to identify concurrent patterns in the log that are tested against parallel blocks in the process model. Tandem repeats are used with various trace reduction and extensions to define repetitive patterns in the log that are tested against loops in the process model. Each pattern is assigned a partial fulfillment. The generalization is then the average of pattern fulfillments weighted by the trace counts for which the patterns have been observed. The dataset no includes the time results and a breakdown of Generalization values for the dataset.

Clear search

Close search

Google apps

Main menu

Public benchmark dataset for Conformance Checking in Process Mining

Production Analysis with Process Mining Technology

CeLOE event log sample

Data from: An IoT-Enriched Event Log for Process Mining in Smart Factories

Process Models obtained from event logs with with different...

JUnit 4.12 Software Event Log

(Un)Fair Process Mining Event Logs (Converted to OCEL)

Real-life event logs - Hospital log

Simplified Event Logs for Sepsis Patient Trajectories

Validation of Precision Measures - Event Logs and Process Models

Process Mining-Based Goal Recognition System Evaluation Dataset

UIS Log: Synthetic User Interface with Screenshots Log

A Collection of Event Logs of Blockchain-based Applications

Data from: Differentially Private Release of Event Logs for Process Mining

Differentially Private Event Logs for Process Mining: Supplementary Material...

Correlation Data for Species-Coverage-based Log Representativeness and TLRA

Statechart Workbench and Alignments Software Event Log

Event log with data attributes

Container Logistics Object-centric Event Log

Event Logs and Process Models for Evaluating Discovery Algorithm Robustness...

Public benchmark dataset for Conformance Checking in Process Mining