We present a method for a reliable detection of "unusual" † The work of this author was supported by the NSF Grant CCR-0208709, and NIH grant R01 GM068959-01.‡ The work of this author was supported by the NSF Grant CCR-0208709, and AFOSR Grant FA 8655-04-1-3074.but did not consider more than one episode scanned simultaneously for an occurrence.
We propose a new method for a reliable identification of significant sequential episodes occurring within a window of size w in an event sequence modeled by a Markov source. As a measure of significance we use Ω ∃ (n, w), the number of windows containing the episode as a subsequence. We prove that Ω ∃ (n, w) is a sum of a ϕ-mixing sequence of random variables and therefore obeys the central limit theorem. This leads us to a computational formula for a threshold to identify significant episodes. The novelty of our method for Markov source stems from the fact that, instead of scoring the whole sequence using a Markov model, we compute the expected value of Ω ∃ (n, w) and its variance in order to estimate the threshold and compare it to the observed Ω ∃ (n, w). Since performance of the method critically depends on the model structure and parameters, we argue that variable-length Markov models of event streams are superior to fixed-length Markov models. We chose DNA sequences as event sources in experiments, and compared the performance of fixed-length Markov models with interpolated Markov models. This paper is an extension of our previous work in [8, 1] where we considered the problem of the reliable detection of significant episodes for memoryless sources.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.