In Process Mining, the augmentation of event logs plays a crucial role in overcoming the limitations posed by the scarcity of real-world data. Current augmentation techniques focus mainly on reproducing global patterns and provide little control over domain-specific rules. However, in practice, augmented logs may need to comply with explicit constraints to be meaningful for further analysis. This thesis introduces a constraint-aware framework for event log augmentation. The approach combines a set of automata, one per rule, which are intersected with the probabilistic automaton built on all input traces, including those that are not compliant, ensuring that the augmented logs preserve both the variability of real data and the conditions imposed by the constraints. The framework was evaluated on several case studies and compared with a probabilistic automaton trained on a preprocessed event log, in which non-compliant traces are removed to enforce a set of human-defined rules, before generating new synthetic traces. Results based on entropy metrics indicate that the proposed method ensures higher generalization by enabling the generation of a larger number of traces while still satisfying the imposed rules, with computational times that remain competitive. These findings confirm the practicality of constraint-aware augmentation and open promising directions for extensions involving additional process perspectives such as resources, attributes, and temporal relations.
In Process Mining, the augmentation of event logs plays a crucial role in overcoming the limitations posed by the scarcity of real-world data. Current augmentation techniques focus mainly on reproducing global patterns and provide little control over domain-specific rules. However, in practice, augmented logs may need to comply with explicit constraints to be meaningful for further analysis. This thesis introduces a constraint-aware framework for event log augmentation. The approach combines a set of automata, one per rule, which are intersected with the probabilistic automaton built on all input traces, including those that are not compliant, ensuring that the augmented logs preserve both the variability of real data and the conditions imposed by the constraints. The framework was evaluated on several case studies and compared with a probabilistic automaton trained on a preprocessed event log, in which non-compliant traces are removed to enforce a set of human-defined rules, before generating new synthetic traces. Results based on entropy metrics indicate that the proposed method ensures higher generalization by enabling the generation of a larger number of traces while still satisfying the imposed rules, with computational times that remain competitive. These findings confirm the practicality of constraint-aware augmentation and open promising directions for extensions involving additional process perspectives such as resources, attributes, and temporal relations.
Augmentation of Event Logs under User-Defined Process Constraints
CIMBRO, LETIZIA
2024/2025
Abstract
In Process Mining, the augmentation of event logs plays a crucial role in overcoming the limitations posed by the scarcity of real-world data. Current augmentation techniques focus mainly on reproducing global patterns and provide little control over domain-specific rules. However, in practice, augmented logs may need to comply with explicit constraints to be meaningful for further analysis. This thesis introduces a constraint-aware framework for event log augmentation. The approach combines a set of automata, one per rule, which are intersected with the probabilistic automaton built on all input traces, including those that are not compliant, ensuring that the augmented logs preserve both the variability of real data and the conditions imposed by the constraints. The framework was evaluated on several case studies and compared with a probabilistic automaton trained on a preprocessed event log, in which non-compliant traces are removed to enforce a set of human-defined rules, before generating new synthetic traces. Results based on entropy metrics indicate that the proposed method ensures higher generalization by enabling the generation of a larger number of traces while still satisfying the imposed rules, with computational times that remain competitive. These findings confirm the practicality of constraint-aware augmentation and open promising directions for extensions involving additional process perspectives such as resources, attributes, and temporal relations.| File | Dimensione | Formato | |
|---|---|---|---|
|
Cimbro_Letizia.pdf
accesso aperto
Dimensione
3.14 MB
Formato
Adobe PDF
|
3.14 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/91824