100+ datasets found
  1. u

    Public benchmark dataset for Conformance Checking in Process Mining

    • figshare.unimelb.edu.au
    • melbourne.figshare.com
    xml
    Updated Jan 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Reissner (2022). Public benchmark dataset for Conformance Checking in Process Mining [Dataset]. http://doi.org/10.26188/5cd91d0d3adaa
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Jan 30, 2022
    Dataset provided by
    The University of Melbourne
    Authors
    Daniel Reissner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains a variety of publicly available real-life event logs. We derived two types of Petri nets for each event log with two state-of-the-art process miners : Inductive Miner (IM) and Split Miner (SM). Each event log-Petri net pair is intended for evaluating the scalability of existing conformance checking techniques.We used this data-set to evaluate the scalability of the S-Component approach for measuring fitness. The dataset contains tables of descriptive statistics of both process models and event logs. In addition, this dataset includes the results in terms of time performance measured in milliseconds for several approaches for both multi-threaded and single-threaded executions. Last, the dataset contains a cost-comparison of different approaches and reports on the degree of over-approximation of the S-Components approach. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/1910.09767. Update:The dataset has been extended with the event logs of the BPIC18 and BPIC19 logs. BPIC19 is actually a collection of four different processes and thus was split into four event logs. For each of the additional five event logs, again, two process models have been mined with inductive and split miner. We used the extended dataset to test the scalability of our tandem repeats approach for measuring fitness. The dataset now contains updated tables of log and model statistics as well as tables of the conducted experiments measuring execution time and raw fitness cost of various fitness approaches. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/2004.01781.Update: The dataset has also been used to measure the scalability of a new Generalization measure based on concurrent and repetitive patterns. : A concurrency oracle is used in tandem with partial orders to identify concurrent patterns in the log that are tested against parallel blocks in the process model. Tandem repeats are used with various trace reduction and extensions to define repetitive patterns in the log that are tested against loops in the process model. Each pattern is assigned a partial fulfillment. The generalization is then the average of pattern fulfillments weighted by the trace counts for which the patterns have been observed. The dataset no includes the time results and a breakdown of Generalization values for the dataset.

  2. 4

    Production Analysis with Process Mining Technology

    • data.4tu.nl
    zip
    Updated Jan 28, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dafna Levy (2014). Production Analysis with Process Mining Technology [Dataset]. http://doi.org/10.4121/uuid:68726926-5ac5-4fab-b873-ee76ea412399
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 28, 2014
    Dataset provided by
    NooL - Integrating People & Solutions
    Authors
    Dafna Levy
    License

    https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use

    Description

    The comma separated value dataset contains process data from a production process, including data on cases, activities, resources, timestamps and more data fields.

  3. T

    CeLOE event log sample

    • dataverse.telkomuniversity.ac.id
    tsv
    Updated Apr 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Telkom University Dataverse (2022). CeLOE event log sample [Dataset]. http://doi.org/10.34820/FK2/9FT77M
    Explore at:
    tsv(10066), tsv(19847)Available download formats
    Dataset updated
    Apr 20, 2022
    Dataset provided by
    Telkom University Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This study analyses an event log, automatically generated by the CeLOE LMS, that records student and lecturer activities in learning. The event log is mined to obtain a process model representing learning behaviours of the lecturers and students during the learning process. The case study in this research is learning in the study program 365 during the first semester of 2020/2021.

  4. Z

    Data from: An IoT-Enriched Event Log for Process Mining in Smart Factories

    • data.niaid.nih.gov
    Updated Jun 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malburg, Lukas (2024). An IoT-Enriched Event Log for Process Mining in Smart Factories [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7795547
    Explore at:
    Dataset updated
    Jun 10, 2024
    Dataset provided by
    Grüger, Joscha
    Malburg, Lukas
    Bergmann, Ralph
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DEPRECATED - current version: https://figshare.com/articles/dataset/Dataset_An_IoT-Enriched_Event_Log_for_Process_Mining_in_Smart_Factories/20130794

    Modern technologies such as the Internet of Things (IoT) are becoming increasingly important in various domains, including Business Process Management (BPM) research. One main research area in BPM is process mining, which can be used to analyze event logs, e.g., for checking the conformance of running processes. However, there are only a few IoT-based event logs available for research purposes. Some of them are artificially generated, and the problem occurs that they do not always completely reflect the actual physical properties of smart environments. In this paper, we present an IoT-enriched XES event log that is generated by a physical smart factory. For this purpose, we created the DataStream XES extension for representing IoT-data in event logs. Finally, we present some preliminary analysis and properties of the log.

  5. Z

    Process Models obtained from event logs with with different...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sander J.J. Leemans (2020). Process Models obtained from event logs with with different information-preserving abstractions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3243987
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Sander J.J. Leemans
    Dirk Fahland
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains results of the experiment to analyze information preservation and recovery by different event log abstractions in process mining described in: Sander J.J. Leemans, Dirk Fahland "Information-Preserving Abstractions of Event Data in Process Mining" Knowledge and Information Systems, ISSN: 0219-1377 (Print) 0219-3116 (Online), accepted May 2019

    The experiment results were obtained with: https://doi.org/10.5281/zenodo.3243981

  6. f

    JUnit 4.12 Software Event Log

    • figshare.com
    • data.4tu.nl
    txt
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maikel Leemans (2023). JUnit 4.12 Software Event Log [Dataset]. http://doi.org/10.4121/uuid:cfed8007-91c8-4b12-98d8-f233e5cd25bb
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Maikel Leemans
    License

    https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use

    Description

    XES software event log obtained through instrumenting JUnit 4.12 using the tool available at {https://svn.win.tue.nl/repos/prom/XPort/}. This event log contains method-call level events describing a single run of the JUnit 4.12 software, available at {https://mvnrepository.com/artifact/junit/junit/4.12} , using the input from {https://github.com/junit-team/junit4/wiki/Getting-started}. Note that the life-cycle information in this log corresponds to method call (start) and return (complete), and captures a method-call hierarchy.

  7. (Un)Fair Process Mining Event Logs (Converted to OCEL)

    • zenodo.org
    bin, xml
    Updated Nov 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alessandro Berti; Alessandro Berti (2024). (Un)Fair Process Mining Event Logs (Converted to OCEL) [Dataset]. http://doi.org/10.5281/zenodo.14043725
    Explore at:
    bin, xmlAvailable download formats
    Dataset updated
    Nov 6, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alessandro Berti; Alessandro Berti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Converted to OCEL 1.0 JSONOCEL and OCEL 2.0 XML from traditional event logs available at: Zenodo - Record 8059489.

    Object Types: { Person }

    Person-level Attributes:

    • (int) overallProtected: An attribute (0/1) indicating whether the person has experienced discrimination. (Note: If you're developing a fairness assessment algorithm, only use this attribute in the testing phase!)
    • (int) sumBoolDiscrFactors: Counts the number of possible discrimination factors that apply to the person.
    • (int) reworkedActivities: The total amount of rework involved in the person’s processing.
    • (float) throughputTime: The total processing time for a person.
    • (int) numOcc_ACTIVITY: Counts the number of times an activity occurs in the person’s lifecycle.

    Event-level Attributes:

    • resource: The resource involved in processing a given person.

    * Hiring

    The data describes a multifaceted recruitment process with diverse application pathways ranging from minimal processing to extensive multi-step procedures. The variability of these routes, largely dependent on numerous determinants, yields a spectrum of outcomes from instant rejection to successful job offers.

    The logs include attributes such as age, citizenship, German proficiency, gender, religion, and years of education. While these attributes may inform candidate profiles, their misuse could engender discrimination. Variables like age and education may signify experience and skills, citizenship and German language may address job logistics, but these should not unjustly eliminate applicants. Gender and religion, unrelated to job performance, must not sway hiring. Therefore, the use of these attributes must uphold fairness, avoiding any potential bias.

    * Hospital

    The data depicts a hospital treatment process that commences with registration at an Emergency Room or Family Department and advances through stages of examination, diagnosis, and treatment. Notably, unsuccessful treatments often entail repetitive diagnostic and treatment cycles, underscoring the iterative nature of healthcare provision.

    The logs incorporate patient attributes such as age, underlying condition, citizenship, German language proficiency, gender, and private insurance. These attributes, influencing the treatment process, may unveil potential discrimination. Factors like age and condition might affect case complexity and treatment path, while citizenship may highlight healthcare access disparities. German proficiency can impact provider-patient communication, thus affecting care quality. Gender could spotlight potential health disparities, while insurance status might indicate socio-economic influences on care quality or timeliness. Therefore, a comprehensive examination of these attributes vis-a-vis the treatment process could shed light on potential biases or disparities, fostering fairness in healthcare delivery.

    * Lending

    This data illustrates the steps within a loan application process. From an initial appointment request, the process navigates various stages, including information verification and underwriting, culminating in loan approval or denial. Additional steps may be required, such as co-signer enlistment or collateral assessment. Some cases experience outright appointment denial, indicating the process's variability, reflecting applicants' differing credit situations.

    The logs' attributes can aid in identifying influences on outcomes and detecting discrimination. Personal characteristics ('age', 'citizen', 'German speaking', and 'gender') and socio-economic indicators ('YearsOfEducation' and 'CreditScore') can impact the process. While 'yearsOfEducation' and 'CreditScore' can validly inform creditworthiness, 'age', 'citizen', 'language ability', and 'gender' should not bias loan decisions, ensuring these attributes are used responsibly fosters equitable loan processes.

    * Renting

    The data represents a rental process. It begins with a prospective tenant applying to view a property. Subsequent steps include an initial screening phase, viewing, decision-making, and a potential extensive screening. The process ends with the acceptance or rejection of the prospective tenant. In some cases, a tenant may apply for viewing but be rejected without the viewing occurring.

    The logs contain attributes that can shed light on potential biases in the process. 'Age', 'citizen', 'German speaking', 'gender', 'religious affiliation', and 'yearsOfEducation' might influence the rental process, leading to potential discrimination. While some attributes may provide useful insights into a potential tenant's reliability, misuse could result in discrimination. Thus, fairness must be observed in utilizing these attributes to avoid potential biases and ensure equitable treatment.

  8. f

    Real-life event logs - Hospital log

    • figshare.com
    txt
    Updated Jul 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boudewijn van Dongen (2020). Real-life event logs - Hospital log [Dataset]. http://doi.org/10.4121/uuid:d9769f3d-0ab0-4fb8-803b-0d1120ffcf54
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 25, 2020
    Dataset provided by
    4TU.ResearchData
    Authors
    Boudewijn van Dongen
    License

    https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use

    Description

    Real life log of a Dutch academic hospital, originally intended for use in the first Business Process Intelligence Contest (BPIC 2011)

  9. Z

    Simplified Event Logs for Sepsis Patient Trajectories

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghahremani, Sona (2024). Simplified Event Logs for Sepsis Patient Trajectories [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3989589
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Sakizloglou, Lucas
    Ghahremani, Sona
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains a simplified excerpt from a real event log that tracks the trajectories of patients admitted to a hospital to be treated for sepsis, a life-threatening condition. The log has been recorded by the Enterprise Resource Planning of the hospital. Additionally, the dataset contains three synthetic logs that increase the number of trajectories within the original log timespan, while maintaining other statistical characteristics.

    In total, the dataset contains four files in .zip format and a companion that describes the statistical method used to synthesize the logs as well as the dataset content in detail. The dataset can be used in testing the performance of event-based process-mining and log (runtime) monitoring tools against an increasing load of events.

  10. f

    Validation of Precision Measures - Event Logs and Process Models

    • figshare.com
    • data.4tu.nl
    zip
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niek Tax (2023). Validation of Precision Measures - Event Logs and Process Models [Dataset]. http://doi.org/10.4121/uuid:991753f7-a240-4ba6-a8a8-67174a08c51b
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Niek Tax
    License

    https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use

    Description

    This collection contains the event logs and process models described and used in the paper "The Imprecisions of Precision Measures in Process Mining"

  11. u

    Process Mining-Based Goal Recognition System Evaluation Dataset

    • figshare.unimelb.edu.au
    application/bzip2
    Updated Aug 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zihang Su (2023). Process Mining-Based Goal Recognition System Evaluation Dataset [Dataset]. http://doi.org/10.26188/21749570.v4
    Explore at:
    application/bzip2Available download formats
    Dataset updated
    Aug 11, 2023
    Dataset provided by
    The University of Melbourne
    Authors
    Zihang Su
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    These datasets are used for evaluating the process mining-based goal recognition system proposed in the paper "Fast and Accurate Data-Driven Goal Recognition Using Process Mining Techniques." The datasets include a running example, an evaluation dataset for synthetic domains, and real-world business logs.running_example.tar.bz contains the traces shown in figure 2 of the paper for learning six skill models toward six goal candidates and the three walks shown in figure 1.a.synthetic_domains.tar.bz2 is the dataset for evaluating GR system in synthetic domains (IPC domains). There are two types of traces used for learning skill models, generated by the top-k planner and generated by the diverse planner. Please extract the archived domains located in topk/ and diverse/. In each domain, the sub-folder problems/ contains the dataset for learning skill models, and the sub-folder test/ contains the traces (plans) for testing the GR performance. There are five levels of observations, 10%, 30%, 50%, 70%, and 100%. For each level of observation, there are multiple problem instances, the instance ID starts from 0. A problem instance contains the synthetic domain model (PDDL files), training traces (in train/), and an observation for testing (obs.dat). The top-k and diverse planners for generating traces can be accessed here. The original PDDL models of the problem instances for the 15 IPC domains mentioned in the paper are available here.business_logs.tar.bz is the dataset for evaluating GR system in real-world domains. There are two types of problem instances: one with only two goal candidates (yes or no), referred to as "binary," and the other containing multiple goal candidates, termed "multiple." Please extract the archived files located in the directories binary/ and multiple/. The traces for learning the skill models can be found in XES files, and the traces (plans) for testing can be found in the directory goal*/.

  12. UIS Log: Synthetic User Interface with Screenshots Log

    • zenodo.org
    zip
    Updated Sep 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio Martínez-Rojas; Antonio Martínez-Rojas; Andrés Jiménez-Ramírez; Andrés Jiménez-Ramírez; José González Enríquez; José González Enríquez; Hajo A. Reijers; Hajo A. Reijers (2024). UIS Log: Synthetic User Interface with Screenshots Log [Dataset]. http://doi.org/10.5281/zenodo.5734323
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 9, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Antonio Martínez-Rojas; Antonio Martínez-Rojas; Andrés Jiménez-Ramírez; Andrés Jiménez-Ramírez; José González Enríquez; José González Enríquez; Hajo A. Reijers; Hajo A. Reijers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These data correspond to the set of problems for evaluating the proposal detailed in Martínez-Rojas et al. 2022. The evaluation utilizes a set of synthetic problems that simulate realistic administrative use cases. Each problem includes a UI Log with a synthetic screenshot corresponding to each event, capturing 3 distinct processes (P) marked by varying complexity levels. These levels are defined by the number of activities, the process execution variants, and the visual features influencing decisions between these variants.

    The implementation of this proposal can be found in the tool available at this GitHub repository, which utilizes the logs of these 3 processes for validation. Here they are described:

    • P1 Client creation. A process with 5 activities and 2 variants. The single decision in this process is made based on the existence of an attachment in the reception email.
    • P2 Client validation. A process with 7 activities and 2 variants. The decision is made based on the user’s response to a query.
    • P3 Client deletion. A process with 7 activities and 4 variants. The decisions are made based on two conditions: (1) the existence of pending invoices and (2) the existence of an attachment to justify the payment of the invoices.

    These processes all contain a single decision point, although the one in P3 is complex. All processes include

    1. synthetic screen captures for their activities and
    2. a sample event log with a single instance for each variant.

    To generate the objects for the valuation, we generate event logs of different sizes (|L|) for each of these processes by deriving events from the sample event log. We consider log sizes in the range of {10, 25, 50, 100} events. Note that we consider complete instances in the log and thus, we remove the last instance if it goes beyond |L|.
    Some of these logs are generated with a balanced number of instances, while others are unbalanced (B?) which present more than 20% of different frequency between the most frequent and less frequent variants. To average the result over a collection of problems, 30 instances are randomly generated for each tuple < P, |L|, B? >.
    In this dataset there are 3 zips, one for each family. Each family corresponds to a process:

    • Basic corresponds to P1
    • Intermediate corresponds to P12
    • Advanced corresponds to P3

    Within these folders, we find 30 different scenarios (folder), in which the look and feel of the applications present in the screenshots have suffered little variations. Within each of these scenarios, variations are carried out respecting to the data entered in the forms and the images or attachments present in the user interface to generate log instances depending on the characteristics of each process.
    For each scenario, we find 8 folders with the concrete problem which is defined by Log_size (in {10,25,50,100}) and Balanced (in {Balanced, Unbalanced}). The name of these folders have this format: Family_LogSize_Balanced.
    Inside each problem folder the UI Log and the screen captures can be found.

    References

    Martínez-Rojas, A., Jiménez-Ramírez, A., Enríquez, J. G., & Reijers, H. A. (2022, September). Analyzing variable human actions for robotic process automation. In International Conference on Business Process Management (pp. 75-90). Cham: Springer International Publishing.

  13. Z

    A Collection of Event Logs of Blockchain-based Applications

    • data.niaid.nih.gov
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alzhrani, Fouzia (2023). A Collection of Event Logs of Blockchain-based Applications [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6637058
    Explore at:
    Dataset updated
    Jan 29, 2023
    Dataset authored and provided by
    Alzhrani, Fouzia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A set of event logs of 101 blockchain-based applications (DApps). For each DApp, there are two event log files. The first one is a raw version where data is encoded by blockchain. The second file is a decoded version where data is decoded into a human-readable format. If a DApp has multiple versions on different blockchain networks, then there are two event log files (encoded and decoded) for each version. In addition, the event registry file includes a comprehensive list of event names and their corresponding signatures obtained from contract ABIs of the 101 DApps.

  14. Data from: Differentially Private Release of Event Logs for Process Mining

    • zenodo.org
    application/gzip
    Updated Oct 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gamal Elkoumy; Gamal Elkoumy; Alisa Pankova; Marlon Dumas; Marlon Dumas; Alisa Pankova (2021). Differentially Private Release of Event Logs for Process Mining [Dataset]. http://doi.org/10.5281/zenodo.5599454
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 26, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gamal Elkoumy; Gamal Elkoumy; Alisa Pankova; Marlon Dumas; Marlon Dumas; Alisa Pankova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The anonymized event logs of the experiments in the paper "Differentially Private Release of Event Logs for Process Mining"

  15. Z

    Differentially Private Event Logs for Process Mining: Supplementary Material...

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gamal Elkoumy; Alisa Pankova; Marlon Dumas (2024). Differentially Private Event Logs for Process Mining: Supplementary Material [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4601138
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Cybernetica, Tartu, Estonia
    University of Tartu, Tartu Estonia
    Authors
    Gamal Elkoumy; Alisa Pankova; Marlon Dumas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this archive, we provide supplementary material for our paper entitled "Mine Me but Don’t Single Me Out: DifferentiallyPrivate Event Logs for Process Mining". We list the selected event logs and their characteristics and descriptive statistics. Also, this archive contains the anonymized event logs as the result of the experiments. The source code is available on GitHub.

  16. u

    Correlation Data for Species-Coverage-based Log Representativeness and TLRA

    • figshare.unimelb.edu.au
    xlsx
    Updated Aug 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anandi Karunaratne; Artem Polyvyanyy (2025). Correlation Data for Species-Coverage-based Log Representativeness and TLRA [Dataset]. http://doi.org/10.26188/26410747.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Aug 4, 2025
    Dataset provided by
    The University of Melbourne
    Authors
    Anandi Karunaratne; Artem Polyvyanyy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the correlation data for species-coverage-based log representativeness measure and Trace-based Log Representativeness Approximation (TLRA) across event logs of 60 generative systems and varying log sizes and noise levels.

  17. Statechart Workbench and Alignments Software Event Log

    • search.datacite.org
    Updated Aug 31, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maikel Leemans (2018). Statechart Workbench and Alignments Software Event Log [Dataset]. http://doi.org/10.4121/uuid:7f787965-da13-4bb8-a3fd-242f08aef9c4
    Explore at:
    Dataset updated
    Aug 31, 2018
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Eindhoven University of Technology
    Authors
    Maikel Leemans
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Extensible Event Stream (XES) software event log obtained through instrumenting the Statechart Workbench ProM plugin using the tool available at {https://svn.win.tue.nl/repos/prom/XPort/}. This event log contains method-call level events describing a workbench run invoking the Alignments algorithm using the BPI Challenge 2012 event log available and documented at {https://doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f} . Note that the life-cycle information in this log corresponds to method call (start) and return (complete), and captures a method-call hierarchy.

  18. Event log with data attributes

    • figshare.com
    xml
    Updated Aug 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dina Bayomie (2022). Event log with data attributes [Dataset]. http://doi.org/10.6084/m9.figshare.20736706.v1
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Aug 31, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Dina Bayomie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These 60 event log varies over the number of cases and the density of the overlapping cases. The log has the following event attributes: event id, case id, activity, timestamp, loan type, amount, resources, and status. And the BPMN scenarios were used to simulate the process.

  19. Container Logistics Object-centric Event Log

    • zenodo.org
    • data.niaid.nih.gov
    bin, json, xml, zip
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benedikt Knopp; Benedikt Knopp; Nina Graves; Nina Graves (2023). Container Logistics Object-centric Event Log [Dataset]. http://doi.org/10.5281/zenodo.8428084
    Explore at:
    json, zip, xml, binAvailable download formats
    Dataset updated
    Oct 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benedikt Knopp; Benedikt Knopp; Nina Graves; Nina Graves
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General Description

    Our company sells goods overseas. After receiving an order, the shipment of goods is scheduled. According to this schedule, the goods are picked up from the local production site and brought to a terminal where a logistics service provider receives and ships them.

    This is an artificial event log according to the OCEL 2.0 Standard simulated using CPN-Tools. Both the CPN and the SQLite can be downloaded.

    Process Overview

    From a customer order perspective, the process begins when the order is registered at our company (register customer order). After registration, a transport document is created in which details of the further process are recorded (create transport document).

    Using this information, the logistics service provider is contacted to coordinate the transport of the ordered goods to the seaport. Twice a week, that provider sends a vehicle to a terminal, with a limited capacity for containers of ordered goods to be transported from the terminal to a seaport. For our company, available capacties vary from vehicle to vehicle, as we are not the only company booking spots. Once the logistics service provider receives our transport documents, they book capacities according to availability and container prioritizations in the upcoming weeks (book vehicles). Once the dates for transporting the goods to the terminal are set, our company contacts a container depot to reserve the required containers (order empty containers).

    When a container’s vehicle departure approaches, the goods are prepared, packed and shipped to the terminal. For this purpose, a truck is sent to the container depot (pick up empty container). Meanwhile, the ordered goods to be shipped are packed into handling units at the production site. After loading the handling units (load truck), the truck drives the full container to the terminal (drive to terminal).

    At the terminal, the container is picked up by a free forklift and weighed (weigh). Unless the vehicle departure is imminent, the container is placed in the storage location at the terminal (place in stock). Finally, it is moved to the vehicle (bring to loading bay, load to vehicle) which departs at a fixed time (depart).

    Despite careful planning, containers sometimes miss a vehicle’s departure. In this case, the container is rescheduled to the next possible vehicle (reschedule container) and kept near the loading ramp until then.

    Further information can be found at: https://www.ocel-standard.org/beta/event-logs/simulations/logistics/

    General Properties

    An overview of log properties is given below.

    PropertyValue
    Event Types14
    Object Types7
    Events35761
    Objects14013

    Control-Flow Behavior

    The behavior of the log is described by a respective object-centric Petri net. Also, individual object types exhibit behavior that can be described by simpler Petri nets. See below.

    ContainerTransport Documents
    Customer OrderTruck
    ForkliftVehicle
    Handling Unit

    Object Relationships

    During the process, object-to-object relations can emerge at activity occurrences as follows.

    ActivitySource Object TypeTarget Object TypeQualifier
    Create Transport
    Document
    Customer OrderTransport DocumentTD for CO
    Book VehicleTransport DocumentVehicleRegular VH for TD
    Book VehicleTransport DocumentVehicleHigh-Prio VH for TD
    Order Empty
    Containers
    Transport DocumentContainerCR for TD
    Pick Empty
    Container
    TruckContainerTR loads CR
    Load TruckContainerHandling UnitCR contains HU
    Reschedule
    Container
    Transport DocumentVehicleSubstitute VH for TD

    Simulation Model

    The CPN used to create this event log can also be downloaded.To obtain simulated data, extract the linked ZIP file and play out the CPN therein, e.g., by using CPN Tools.

    The play-out produces CSV files according to the schema of OCEL2.0. This Python notebook can be used to convert these files to an SQLite dump.

    For a technical documentation of the simulation model, please open the attached CPN with CPN Tools and see the annotations therein.

    Acknowledgements

    Funded under the Excellence Strategy of the Federal Government and the Länder. Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy - EXC-2023 Internet of Production - 390621612. We also thank the Alexander von Humboldt (AvH) Stiftung for supporting our research.

  20. i

    Event Logs and Process Models for Evaluating Discovery Algorithm Robustness...

    • ieee-dataport.org
    Updated Oct 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anandi Karunaratne (2025). Event Logs and Process Models for Evaluating Discovery Algorithm Robustness under Noise [Dataset]. https://ieee-dataport.org/documents/event-logs-and-process-models-evaluating-discovery-algorithm-robustness-under-noise
    Explore at:
    Dataset updated
    Oct 22, 2025
    Authors
    Anandi Karunaratne
    Description

    Heuristics Miner

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Daniel Reissner (2022). Public benchmark dataset for Conformance Checking in Process Mining [Dataset]. http://doi.org/10.26188/5cd91d0d3adaa

Public benchmark dataset for Conformance Checking in Process Mining

Explore at:
5 scholarly articles cite this dataset (View in Google Scholar)
xmlAvailable download formats
Dataset updated
Jan 30, 2022
Dataset provided by
The University of Melbourne
Authors
Daniel Reissner
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains a variety of publicly available real-life event logs. We derived two types of Petri nets for each event log with two state-of-the-art process miners : Inductive Miner (IM) and Split Miner (SM). Each event log-Petri net pair is intended for evaluating the scalability of existing conformance checking techniques.We used this data-set to evaluate the scalability of the S-Component approach for measuring fitness. The dataset contains tables of descriptive statistics of both process models and event logs. In addition, this dataset includes the results in terms of time performance measured in milliseconds for several approaches for both multi-threaded and single-threaded executions. Last, the dataset contains a cost-comparison of different approaches and reports on the degree of over-approximation of the S-Components approach. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/1910.09767. Update:The dataset has been extended with the event logs of the BPIC18 and BPIC19 logs. BPIC19 is actually a collection of four different processes and thus was split into four event logs. For each of the additional five event logs, again, two process models have been mined with inductive and split miner. We used the extended dataset to test the scalability of our tandem repeats approach for measuring fitness. The dataset now contains updated tables of log and model statistics as well as tables of the conducted experiments measuring execution time and raw fitness cost of various fitness approaches. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/2004.01781.Update: The dataset has also been used to measure the scalability of a new Generalization measure based on concurrent and repetitive patterns. : A concurrency oracle is used in tandem with partial orders to identify concurrent patterns in the log that are tested against parallel blocks in the process model. Tandem repeats are used with various trace reduction and extensions to define repetitive patterns in the log that are tested against loops in the process model. Each pattern is assigned a partial fulfillment. The generalization is then the average of pattern fulfillments weighted by the trace counts for which the patterns have been observed. The dataset no includes the time results and a breakdown of Generalization values for the dataset.

Search
Clear search
Close search
Google apps
Main menu