Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a variety of publicly available real-life event logs. We derived two types of Petri nets for each event log with two state-of-the-art process miners : Inductive Miner (IM) and Split Miner (SM). Each event log-Petri net pair is intended for evaluating the scalability of existing conformance checking techniques.We used this data-set to evaluate the scalability of the S-Component approach for measuring fitness. The dataset contains tables of descriptive statistics of both process models and event logs. In addition, this dataset includes the results in terms of time performance measured in milliseconds for several approaches for both multi-threaded and single-threaded executions. Last, the dataset contains a cost-comparison of different approaches and reports on the degree of over-approximation of the S-Components approach. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/1910.09767. Update:The dataset has been extended with the event logs of the BPIC18 and BPIC19 logs. BPIC19 is actually a collection of four different processes and thus was split into four event logs. For each of the additional five event logs, again, two process models have been mined with inductive and split miner. We used the extended dataset to test the scalability of our tandem repeats approach for measuring fitness. The dataset now contains updated tables of log and model statistics as well as tables of the conducted experiments measuring execution time and raw fitness cost of various fitness approaches. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/2004.01781.Update: The dataset has also been used to measure the scalability of a new Generalization measure based on concurrent and repetitive patterns. : A concurrency oracle is used in tandem with partial orders to identify concurrent patterns in the log that are tested against parallel blocks in the process model. Tandem repeats are used with various trace reduction and extensions to define repetitive patterns in the log that are tested against loops in the process model. Each pattern is assigned a partial fulfillment. The generalization is then the average of pattern fulfillments weighted by the trace counts for which the patterns have been observed. The dataset no includes the time results and a breakdown of Generalization values for the dataset.
Facebook
Twitterhttps://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
The comma separated value dataset contains process data from a production process, including data on cases, activities, resources, timestamps and more data fields.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This study analyses an event log, automatically generated by the CeLOE LMS, that records student and lecturer activities in learning. The event log is mined to obtain a process model representing learning behaviours of the lecturers and students during the learning process. The case study in this research is learning in the study program 365 during the first semester of 2020/2021.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DEPRECATED - current version: https://figshare.com/articles/dataset/Dataset_An_IoT-Enriched_Event_Log_for_Process_Mining_in_Smart_Factories/20130794
Modern technologies such as the Internet of Things (IoT) are becoming increasingly important in various domains, including Business Process Management (BPM) research. One main research area in BPM is process mining, which can be used to analyze event logs, e.g., for checking the conformance of running processes. However, there are only a few IoT-based event logs available for research purposes. Some of them are artificially generated, and the problem occurs that they do not always completely reflect the actual physical properties of smart environments. In this paper, we present an IoT-enriched XES event log that is generated by a physical smart factory. For this purpose, we created the DataStream XES extension for representing IoT-data in event logs. Finally, we present some preliminary analysis and properties of the log.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains results of the experiment to analyze information preservation and recovery by different event log abstractions in process mining described in: Sander J.J. Leemans, Dirk Fahland "Information-Preserving Abstractions of Event Data in Process Mining" Knowledge and Information Systems, ISSN: 0219-1377 (Print) 0219-3116 (Online), accepted May 2019
The experiment results were obtained with: https://doi.org/10.5281/zenodo.3243981
Facebook
Twitterhttps://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
XES software event log obtained through instrumenting JUnit 4.12 using the tool available at {https://svn.win.tue.nl/repos/prom/XPort/}. This event log contains method-call level events describing a single run of the JUnit 4.12 software, available at {https://mvnrepository.com/artifact/junit/junit/4.12} , using the input from {https://github.com/junit-team/junit4/wiki/Getting-started}. Note that the life-cycle information in this log corresponds to method call (start) and return (complete), and captures a method-call hierarchy.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Converted to OCEL 1.0 JSONOCEL and OCEL 2.0 XML from traditional event logs available at: Zenodo - Record 8059489.
Object Types: { Person }
Person-level Attributes:
Event-level Attributes:
* Hiring
The data describes a multifaceted recruitment process with diverse application pathways ranging from minimal processing to extensive multi-step procedures. The variability of these routes, largely dependent on numerous determinants, yields a spectrum of outcomes from instant rejection to successful job offers.
The logs include attributes such as age, citizenship, German proficiency, gender, religion, and years of education. While these attributes may inform candidate profiles, their misuse could engender discrimination. Variables like age and education may signify experience and skills, citizenship and German language may address job logistics, but these should not unjustly eliminate applicants. Gender and religion, unrelated to job performance, must not sway hiring. Therefore, the use of these attributes must uphold fairness, avoiding any potential bias.
* Hospital
The data depicts a hospital treatment process that commences with registration at an Emergency Room or Family Department and advances through stages of examination, diagnosis, and treatment. Notably, unsuccessful treatments often entail repetitive diagnostic and treatment cycles, underscoring the iterative nature of healthcare provision.
The logs incorporate patient attributes such as age, underlying condition, citizenship, German language proficiency, gender, and private insurance. These attributes, influencing the treatment process, may unveil potential discrimination. Factors like age and condition might affect case complexity and treatment path, while citizenship may highlight healthcare access disparities. German proficiency can impact provider-patient communication, thus affecting care quality. Gender could spotlight potential health disparities, while insurance status might indicate socio-economic influences on care quality or timeliness. Therefore, a comprehensive examination of these attributes vis-a-vis the treatment process could shed light on potential biases or disparities, fostering fairness in healthcare delivery.
* Lending
This data illustrates the steps within a loan application process. From an initial appointment request, the process navigates various stages, including information verification and underwriting, culminating in loan approval or denial. Additional steps may be required, such as co-signer enlistment or collateral assessment. Some cases experience outright appointment denial, indicating the process's variability, reflecting applicants' differing credit situations.
The logs' attributes can aid in identifying influences on outcomes and detecting discrimination. Personal characteristics ('age', 'citizen', 'German speaking', and 'gender') and socio-economic indicators ('YearsOfEducation' and 'CreditScore') can impact the process. While 'yearsOfEducation' and 'CreditScore' can validly inform creditworthiness, 'age', 'citizen', 'language ability', and 'gender' should not bias loan decisions, ensuring these attributes are used responsibly fosters equitable loan processes.
* Renting
The data represents a rental process. It begins with a prospective tenant applying to view a property. Subsequent steps include an initial screening phase, viewing, decision-making, and a potential extensive screening. The process ends with the acceptance or rejection of the prospective tenant. In some cases, a tenant may apply for viewing but be rejected without the viewing occurring.
The logs contain attributes that can shed light on potential biases in the process. 'Age', 'citizen', 'German speaking', 'gender', 'religious affiliation', and 'yearsOfEducation' might influence the rental process, leading to potential discrimination. While some attributes may provide useful insights into a potential tenant's reliability, misuse could result in discrimination. Thus, fairness must be observed in utilizing these attributes to avoid potential biases and ensure equitable treatment.
Facebook
Twitterhttps://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
Real life log of a Dutch academic hospital, originally intended for use in the first Business Process Intelligence Contest (BPIC 2011)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a simplified excerpt from a real event log that tracks the trajectories of patients admitted to a hospital to be treated for sepsis, a life-threatening condition. The log has been recorded by the Enterprise Resource Planning of the hospital. Additionally, the dataset contains three synthetic logs that increase the number of trajectories within the original log timespan, while maintaining other statistical characteristics.
In total, the dataset contains four files in .zip format and a companion that describes the statistical method used to synthesize the logs as well as the dataset content in detail. The dataset can be used in testing the performance of event-based process-mining and log (runtime) monitoring tools against an increasing load of events.
Facebook
Twitterhttps://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
This collection contains the event logs and process models described and used in the paper "The Imprecisions of Precision Measures in Process Mining"
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
These datasets are used for evaluating the process mining-based goal recognition system proposed in the paper "Fast and Accurate Data-Driven Goal Recognition Using Process Mining Techniques." The datasets include a running example, an evaluation dataset for synthetic domains, and real-world business logs.running_example.tar.bz contains the traces shown in figure 2 of the paper for learning six skill models toward six goal candidates and the three walks shown in figure 1.a.synthetic_domains.tar.bz2 is the dataset for evaluating GR system in synthetic domains (IPC domains). There are two types of traces used for learning skill models, generated by the top-k planner and generated by the diverse planner. Please extract the archived domains located in topk/ and diverse/. In each domain, the sub-folder problems/ contains the dataset for learning skill models, and the sub-folder test/ contains the traces (plans) for testing the GR performance. There are five levels of observations, 10%, 30%, 50%, 70%, and 100%. For each level of observation, there are multiple problem instances, the instance ID starts from 0. A problem instance contains the synthetic domain model (PDDL files), training traces (in train/), and an observation for testing (obs.dat). The top-k and diverse planners for generating traces can be accessed here. The original PDDL models of the problem instances for the 15 IPC domains mentioned in the paper are available here.business_logs.tar.bz is the dataset for evaluating GR system in real-world domains. There are two types of problem instances: one with only two goal candidates (yes or no), referred to as "binary," and the other containing multiple goal candidates, termed "multiple." Please extract the archived files located in the directories binary/ and multiple/. The traces for learning the skill models can be found in XES files, and the traces (plans) for testing can be found in the directory goal*/.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data correspond to the set of problems for evaluating the proposal detailed in Martínez-Rojas et al. 2022. The evaluation utilizes a set of synthetic problems that simulate realistic administrative use cases. Each problem includes a UI Log with a synthetic screenshot corresponding to each event, capturing 3 distinct processes (P) marked by varying complexity levels. These levels are defined by the number of activities, the process execution variants, and the visual features influencing decisions between these variants.
The implementation of this proposal can be found in the tool available at this GitHub repository, which utilizes the logs of these 3 processes for validation. Here they are described:
These processes all contain a single decision point, although the one in P3 is complex. All processes include
To generate the objects for the valuation, we generate event logs of different sizes (|L|) for each of these processes by deriving events from the sample event log. We consider log sizes in the range of {10, 25, 50, 100} events. Note that we consider complete instances in the log and thus, we remove the last instance if it goes beyond |L|.
Some of these logs are generated with a balanced number of instances, while others are unbalanced (B?) which present more than 20% of different frequency between the most frequent and less frequent variants. To average the result over a collection of problems, 30 instances are randomly generated for each tuple < P, |L|, B? >.
In this dataset there are 3 zips, one for each family. Each family corresponds to a process:
Within these folders, we find 30 different scenarios (folder), in which the look and feel of the applications present in the screenshots have suffered little variations. Within each of these scenarios, variations are carried out respecting to the data entered in the forms and the images or attachments present in the user interface to generate log instances depending on the characteristics of each process.
For each scenario, we find 8 folders with the concrete problem which is defined by Log_size (in {10,25,50,100}) and Balanced (in {Balanced, Unbalanced}). The name of these folders have this format: Family_LogSize_Balanced.
Inside each problem folder the UI Log and the screen captures can be found.
References
Martínez-Rojas, A., Jiménez-Ramírez, A., Enríquez, J. G., & Reijers, H. A. (2022, September). Analyzing variable human actions for robotic process automation. In International Conference on Business Process Management (pp. 75-90). Cham: Springer International Publishing.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A set of event logs of 101 blockchain-based applications (DApps). For each DApp, there are two event log files. The first one is a raw version where data is encoded by blockchain. The second file is a decoded version where data is decoded into a human-readable format. If a DApp has multiple versions on different blockchain networks, then there are two event log files (encoded and decoded) for each version. In addition, the event registry file includes a comprehensive list of event names and their corresponding signatures obtained from contract ABIs of the 101 DApps.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The anonymized event logs of the experiments in the paper "Differentially Private Release of Event Logs for Process Mining"
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this archive, we provide supplementary material for our paper entitled "Mine Me but Don’t Single Me Out: DifferentiallyPrivate Event Logs for Process Mining". We list the selected event logs and their characteristics and descriptive statistics. Also, this archive contains the anonymized event logs as the result of the experiments. The source code is available on GitHub.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the correlation data for species-coverage-based log representativeness measure and Trace-based Log Representativeness Approximation (TLRA) across event logs of 60 generative systems and varying log sizes and noise levels.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Extensible Event Stream (XES) software event log obtained through instrumenting the Statechart Workbench ProM plugin using the tool available at {https://svn.win.tue.nl/repos/prom/XPort/}. This event log contains method-call level events describing a workbench run invoking the Alignments algorithm using the BPI Challenge 2012 event log available and documented at {https://doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f} . Note that the life-cycle information in this log corresponds to method call (start) and return (complete), and captures a method-call hierarchy.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These 60 event log varies over the number of cases and the density of the overlapping cases. The log has the following event attributes: event id, case id, activity, timestamp, loan type, amount, resources, and status. And the BPMN scenarios were used to simulate the process.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General Description
Our company sells goods overseas. After receiving an order, the shipment of goods is scheduled. According to this schedule, the goods are picked up from the local production site and brought to a terminal where a logistics service provider receives and ships them.
This is an artificial event log according to the OCEL 2.0 Standard simulated using CPN-Tools. Both the CPN and the SQLite can be downloaded.
Process Overview
From a customer order perspective, the process begins when the order is registered at our company (register customer order). After registration, a transport document is created in which details of the further process are recorded (create transport document).
Using this information, the logistics service provider is contacted to coordinate the transport of the ordered goods to the seaport. Twice a week, that provider sends a vehicle to a terminal, with a limited capacity for containers of ordered goods to be transported from the terminal to a seaport. For our company, available capacties vary from vehicle to vehicle, as we are not the only company booking spots. Once the logistics service provider receives our transport documents, they book capacities according to availability and container prioritizations in the upcoming weeks (book vehicles). Once the dates for transporting the goods to the terminal are set, our company contacts a container depot to reserve the required containers (order empty containers).
When a container’s vehicle departure approaches, the goods are prepared, packed and shipped to the terminal. For this purpose, a truck is sent to the container depot (pick up empty container). Meanwhile, the ordered goods to be shipped are packed into handling units at the production site. After loading the handling units (load truck), the truck drives the full container to the terminal (drive to terminal).
At the terminal, the container is picked up by a free forklift and weighed (weigh). Unless the vehicle departure is imminent, the container is placed in the storage location at the terminal (place in stock). Finally, it is moved to the vehicle (bring to loading bay, load to vehicle) which departs at a fixed time (depart).
Despite careful planning, containers sometimes miss a vehicle’s departure. In this case, the container is rescheduled to the next possible vehicle (reschedule container) and kept near the loading ramp until then.
Further information can be found at: https://www.ocel-standard.org/beta/event-logs/simulations/logistics/
General Properties
An overview of log properties is given below.
| Property | Value |
|---|---|
| Event Types | 14 |
| Object Types | 7 |
| Events | 35761 |
| Objects | 14013 |
Control-Flow Behavior
The behavior of the log is described by a respective object-centric Petri net. Also, individual object types exhibit behavior that can be described by simpler Petri nets. See below.
| Container | Transport Documents |
| Customer Order | Truck |
| Forklift | Vehicle |
| Handling Unit |
Object Relationships
During the process, object-to-object relations can emerge at activity occurrences as follows.
| Activity | Source Object Type | Target Object Type | Qualifier |
|---|---|---|---|
| Create Transport Document | Customer Order | Transport Document | TD for CO |
| Book Vehicle | Transport Document | Vehicle | Regular VH for TD |
| Book Vehicle | Transport Document | Vehicle | High-Prio VH for TD |
| Order Empty Containers | Transport Document | Container | CR for TD |
| Pick Empty Container | Truck | Container | TR loads CR |
| Load Truck | Container | Handling Unit | CR contains HU |
| Reschedule Container | Transport Document | Vehicle | Substitute VH for TD |
Simulation Model
The CPN used to create this event log can also be downloaded.To obtain simulated data, extract the linked ZIP file and play out the CPN therein, e.g., by using CPN Tools.
The play-out produces CSV files according to the schema of OCEL2.0. This Python notebook can be used to convert these files to an SQLite dump.
For a technical documentation of the simulation model, please open the attached CPN with CPN Tools and see the annotations therein.
Acknowledgements
Funded under the Excellence Strategy of the Federal Government and the Länder. Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy - EXC-2023 Internet of Production - 390621612. We also thank the Alexander von Humboldt (AvH) Stiftung for supporting our research.
Facebook
TwitterHeuristics Miner
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a variety of publicly available real-life event logs. We derived two types of Petri nets for each event log with two state-of-the-art process miners : Inductive Miner (IM) and Split Miner (SM). Each event log-Petri net pair is intended for evaluating the scalability of existing conformance checking techniques.We used this data-set to evaluate the scalability of the S-Component approach for measuring fitness. The dataset contains tables of descriptive statistics of both process models and event logs. In addition, this dataset includes the results in terms of time performance measured in milliseconds for several approaches for both multi-threaded and single-threaded executions. Last, the dataset contains a cost-comparison of different approaches and reports on the degree of over-approximation of the S-Components approach. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/1910.09767. Update:The dataset has been extended with the event logs of the BPIC18 and BPIC19 logs. BPIC19 is actually a collection of four different processes and thus was split into four event logs. For each of the additional five event logs, again, two process models have been mined with inductive and split miner. We used the extended dataset to test the scalability of our tandem repeats approach for measuring fitness. The dataset now contains updated tables of log and model statistics as well as tables of the conducted experiments measuring execution time and raw fitness cost of various fitness approaches. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/2004.01781.Update: The dataset has also been used to measure the scalability of a new Generalization measure based on concurrent and repetitive patterns. : A concurrency oracle is used in tandem with partial orders to identify concurrent patterns in the log that are tested against parallel blocks in the process model. Tandem repeats are used with various trace reduction and extensions to define repetitive patterns in the log that are tested against loops in the process model. Each pattern is assigned a partial fulfillment. The generalization is then the average of pattern fulfillments weighted by the trace counts for which the patterns have been observed. The dataset no includes the time results and a breakdown of Generalization values for the dataset.