2015 IEEE 31st International Conference on Data Engineering 2015
DOI: 10.1109/icde.2015.7113270
|View full text |Cite
|
Sign up to set email alerts
|

Cleaning structured event logs: A graph repair approach

Abstract: Abstract-Event data are often dirty owing to various recording conventions or simply system errors. These errors may cause many serious damages to real applications, such as inaccurate provenance answers, poor profiling results or concealing interesting patterns from event data. Cleaning dirty event data is strongly demanded. While existing event data cleaning techniques view event logs as sequences, structural information do exist among events. We argue that such structural information enhances not only the a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(30 citation statements)
references
References 32 publications
0
30
0
Order By: Relevance
“…In these cases, and if this behavior is infrequent enough to allow the event log to remain meaningful, the most common way for existing process mining techniques to deal with missing data is by filtering out the affected traces and performing discovery and conformance checking on the resulting filtered event log. While filtering out missing values is straightforward, various methodologies of event log filtering have been proposed in the past to solve the problem of incorrect event attributes: the filtering can take place thanks to a reference model, which can be given as process specification [12], or from information discovered from the frequent and well-formed traces of the same event log; for example extracting an automaton from the frequent traces [7], computing conditional probabilities of frequent sequences of activities [9], or discovering a probabilistic automaton [13]. In the latter cases, the noise is identified as infrequent behavior.…”
Section: Related Workmentioning
confidence: 99%
“…In these cases, and if this behavior is infrequent enough to allow the event log to remain meaningful, the most common way for existing process mining techniques to deal with missing data is by filtering out the affected traces and performing discovery and conformance checking on the resulting filtered event log. While filtering out missing values is straightforward, various methodologies of event log filtering have been proposed in the past to solve the problem of incorrect event attributes: the filtering can take place thanks to a reference model, which can be given as process specification [12], or from information discovered from the frequent and well-formed traces of the same event log; for example extracting an automaton from the frequent traces [7], computing conditional probabilities of frequent sequences of activities [9], or discovering a probabilistic automaton [13]. In the latter cases, the noise is identified as infrequent behavior.…”
Section: Related Workmentioning
confidence: 99%
“…When queue Q is empty, it indicates that we cannot find candidate paths, and thus return an empty query answer set ∅ (lines [19][20]. Otherwise, we will prepare for expanding the length of candidate paths in Q (lines [21][22][23][24][25][26][27][28] Candidate Path Refinement: After the index traversal and path expansion, we can obtain a number of candidate paths in set S cand . Then, we can refine these paths by checking constraints of keywords, weather, and traveling times, and return actual PRAO answers (line 41).…”
Section: The Prao Query Proceduresmentioning
confidence: 99%
“…With respect to noise filtering in context of event logs, three approaches are described in literature [9,12,21]. The approach proposed by Wang et al [21] relies on a reference process model to repair a log whose events are affected by labels that do not match the expected behaviour of the reference model. The approach proposed by Conforti et al [9] removes events that cannot be reproduced by an automaton constructed using frequent process behaviour recorded in the log.…”
Section: Related Workmentioning
confidence: 99%