Heuristic-based methods are among the most popular methods in the process discovery area. This category of methods is composed of two main steps: 1) discovering a dependency graph 2) determining the split/join patterns of the dependency graph. The current dependency graph discovery techniques of heuristic-based methods select the initial set of graph arcs according to dependency measures and then modify the set regarding some criteria. This can lead to selecting the non-optimal set of arcs. Also, the modifications can result in modeling rare behaviors and, consequently, low precision and non-simple process models. Thus, constructing dependency graphs through selecting the optimal set of arcs has a high potential for improving graphs quality. Hence, this paper proposes a new integer linear programming model that determines the optimal set of graph arcs regarding dependency measures. Simultaneously, the proposed method can eliminate some other issues that the existing methods cannot handle completely; i.e., even in the presence of loops, it guarantees that all tasks are on a path from the initial to the final tasks. This approach also allows utilizing domain knowledge by introducing appropriate constraints, which can be a practical advantage in real-world problems. To assess the results, we modified two existing methods of evaluating process models to make them capable of measuring the quality of dependency graphs. According to assessments, the outputs of the proposed method are superior
Identifying the split and join patterns of dependency graphs is an essential step in Heuristics Mining process discovery methods. The existing methods determine the split/join patterns (consisting of AND, and XOR relations) according to the event log information about the activities involved in the splits and joins. Hence, they neglect the event log information available for the other activities on the paths from split points to join points. On the other hand, the current methods determine the patterns of each split/join separately and do not aim to select the best set of patterns. Therefore, the outputs of the existing methods can be non-optimal. Furthermore, the current methods cannot guarantee that for each AND-split there is a matching And-join, and vice versa. This can make some split/join patterns incapable of being activated. To handle these issues, this paper, for the first time, presents an integer linear programming model which identifies the optimal patterns of splits/joins with regard to all succession information that is available in the event log; simultaneously, it ensures that for each AND-split there is at least one matching AND-join, and vice versa. The objective function of the proposed model is inspired by replay fitness and precision dimensions of process model quality. According to the assessments, the process models obtained by the proposed method are superior to the results of the most prominent methods of determining split/join patterns in terms of replay fitness, precision, simplicity, and matching AND-splits with AND-joins.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.