Na Fang scite author profile

et al. 2021

Connection Science

Process discovery usually analyses frequent behaviour in event logs to gain an intuitive understanding of processes. However, there are some effective infrequent behaviours that help to improve business processes in real life. Most existing studies either ignore them or treat them as harmful behaviours. To distinguish effective infrequent sequences from noisy activities, this paper proposes an algorithm to analyse the distribution states of activities and the strong transfer relationships between behaviours based on maximum probability paths. The algorithm divides episodic traces into two categories: harmful and useful episodes, namely noisy activities and effective sequences. First, using conditional probability entropy, the infrequent logs are pre-processed to remove individual noisy activities that are extremely irregularly distributed in the traces. Effective sequences are then extracted from the logs based on the state transfer information of the activities. The algorithm is based on a PM4Py implementation and is validated using synthetic and real logs. From the results, the algorithm not only preserves the key structure of the model and reduces noise activity, but also improves the quality of the model.

Online Incremental Mining Based on Trusted Behavior Interval

Fang¹,

Fang²,

Lu³

et al. 2021

IEEE Access

Incremental mining improves the quality of process mining by analyzing the differences between event logs and a reference model to obtain valuable information to update the reference model. Existing incremental mining methods focus on offline logs by setting thresholds for analysis, which limits process mining efforts by the domain knowledge, log completeness, and business completion time. Aiming at these problems, a real-time incremental mining algorithm based on the trusted behavior interval is proposed to analyze online event streams for updating the reference model. First, a clustering technique to analyze an existing reference model selects the core structure of the model and calculates the trusted behavior interval. Then, the behavioral and structural relationships between the online event streams and the reference model are analyzed to obtain a valid candidate set. Based on this set, an incremental update algorithm is proposed to optimize the model structure to achieve an online dynamic update of the reference model. The proposed algorithm is implemented in PM4PY and Scikit-learn frameworks; a reasonable number of clusters is determined using the elbow method and validated with artificial and real data. Experimental results show that the algorithm improves the efficiency of incremental mining and enhances the quality of the model with both complete and incomplete data.

Anomalous Behavior Detection Based on the Isolation Forest Model with Multiple Perspective Business Processes

2022

Electronics

Anomalous behavior detection in business processes inspects abnormal situations, such as errors and missing values in system execution records, to facilitate safe system operation. Since anomaly information hinders the insightful investigation of event logs, many approaches have contributed to anomaly detection in either the business process domain or the data mining domain. However, most of them ignore the impact brought by the interaction between activities and their related attributes. Based on this, a method is constructed to integrate the consistency degree of multi-perspective log features and use it in an isolation forest model for anomaly detection. First, a reference model is captured from the event logs using process discovery. After that, the similarity between behaviors is analyzed based on the neighborhood distance between the logs and the reference model, and the data flow similarity is measured based on the matching relationship of the process activity attributes. Then, the integration consistency measure is constructed. Based on this, the composite log feature vectors are produced by combining the activity sequences and attribute sequences in the event logs and are fed to the isolation forest model for training. Subsequently, anomaly scores are calculated and anomalous behavior is determined based on different threshold-setting strategies. Finally, the proposed algorithm is implemented using the Scikit-learn framework and evaluated in real logs regarding anomalous behavior recognition rate and model quality improvement. The experimental results show that the algorithm can detect abnormal behaviors in event logs and improve the model quality.

PN-BBN: A Petri Net-Based Bayesian Network for Anomalous Behavior Detection

2022

Mathematics

Business process anomalous behavior detection reveals unexpected cases from event logs to ensure the trusted operation of information systems. Anomaly behavior is mainly identified through a log-to-model alignment analysis or numerical outlier detection. However, both approaches ignore the influence of probability distributions or activity relationships in process activities. Based on this concern, this paper incorporates the behavioral relationships characterized by the process model and the joint probability distribution of nodes related to suspected anomalous behaviors. Moreover, a Petri Net-Based Bayesian Network (PN-BBN) is proposed to detect anomalous behaviors based on the probabilistic inference of behavioral contexts. First, the process model is filtered based on the process structure of the process activities to identify the key regions where the suspected anomalous behaviors are located. Then, the behavioral profile of the activity is used to prune it to position the ineluctable paths that trigger these activities. Further, the model is used as the architecture for parameter learning to construct the PN-BBN. Based on this, anomaly scores are inferred based on the joint probabilities of activities related to suspected anomalous behaviors for anomaly detection under the constraints of control flow and probability distributions. Finally, PN-BBN is implemented based on the open-source frameworks PM4PY and PMGPY and evaluated from multiple metrics with synthetic and real process data. The experimental results demonstrate that PN-BBN effectively identifies anomalous process behaviors and improves the reliability of information systems.

Online incremental updating for model enhancement based on multi-perspective trusted intervals

2022

Connection Science