In this paper, we analyze the model proposed in García and Londoño1 in which a set of p‐independent sequences of discrete time Markov chains is considered, over a finite alphabet A and with finite order o. The model is obtained identifying the states on the state space Ao where two or more sequences share the same transition probabilities (see also García and González‐López2). This identification establishes a partition on {1,…,p}×Ao, the set of sequences, and the state space. We show that by means of the Bayesian information criterion (BIC), the partition can be estimated eventually almost surely. Also, in García and Londoño,1 it is given a notion of divergence, derived from the BIC, which serves to identify the proximity/discrepancy between elements of {1,…,p}×Ao (see also García et al3). In the present article, we prove that this notion is a metric in the space where the model is built and that it is statistically consistent to determine proximity/discrepancy between the elements of the space {1,…,p}×Ao. We apply the notions discussed here for the construction of a parsimonious model that represents the common stochastic structure of 153 complete genomic Zika sequences, coming from tropical and subtropical regions.
Abstract:The Partition Markov Model characterizes the process by a partition L of the state space, where the elements in each part of L share the same transition probability to an arbitrary element in the alphabet. This model aims to answer the following questions: what is the minimal number of parameters needed to specify a Markov chain and how to estimate these parameters. In order to answer these questions, we build a consistent strategy for model selection which consist of: giving a size n realization of the process, finding a model within the Partition Markov class, with a minimal number of parts to represent the process law. From the strategy, we derive a measure that establishes a metric in the state space. In addition, we show that if the law of the process is Markovian, then, eventually, when n goes to infinity, L will be retrieved. We show an application to model internet navigation patterns.
In this paper, we propose a procedure of selecting samples from a set of samples coming from Markovian processes of finite order and finite alphabet. Under the assumption of the existence of a law that prevails in at least q% of the samples of the collection, we show that the procedure allows to identify samples governed by the predominant law. The approach is based on a local metric between samples, which tends to zero when we compare samples of identical law and tends to infinity when comparing samples with different laws. The local metric allows to define a criterion which takes arbitrarily large values when the previous assumption about the existence of a predominant law does not hold. By means of this procedure, we map similarities and dissimilarities of some Brazilian stocks' daily trading volume dynamic.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.