For model-specification purpose, we study asymptotic behavior of the marginal quasi-log likelihood associated with a family of locally asymptotically quadratic (LAQ) statistical experiments. Our result entails a far-reaching extension of applicable scope of the classical approximate Bayesian model comparison due to Schwarz, with frequentist-view theoretical foundation. In particular, the proposed statistics can deal with both ergodic and non-ergodic stochastic-process models, where the corresponding M -estimator is of multi-scaling type and the asymptotic quasi-information matrix is random. Focusing on the ergodic diffusion model, we also deduce the consistency of the multistage optimal-model selection where we may select an optimal sub-model structure step by step, so that computational cost can be much reduced. We illustrate the proposed method by the Gaussian quasi-likelihood for diffusion-type models in details, together with several numerical experiments.Date: November 5, 2018. Key words and phrases. Approximate Bayesian model comparison, Gaussian quasi-likelihood, locally asymptotically quadratic family, quasi-likelihood, Schwarz's criterion.
1• The measurable function x → H m,n (x|θ m ) for each θ m ∈ Θ m defines a logarithmic regular conditional probability density of L(X n |θ m ) with respect to µ n (dx).Each M m may be misspecified in the sense that the true data generating model g n (x) does not belong to the family {exp{H m,n (·|θ m )}| θ m ∈ Θ m }; we will, however, assume suitable regularity conditions for the associated statistical random fields to have a suitable asymptotic behavior.Concerning the model M m , the random function θ m → exp{H m,n (X n |θ m )}, assumed to be a.s. welldefined, is referred to as the quasi-likelihood of L(X n |θ m ). The quasi-maximum likelihood estimator (QMLE)θ m,n associated with H m,n is defined to be any maximizer of H m,n :For simplicity we will assume the a.s. continuity of H m,n over the compact setΘ m , so thatθ m,n always exists.Our objective includes estimators of multi-scaling type, meaning that the components ofθ m,n converges at different rates, which can often occur when considering high-frequency asymptotics. A typical example is the Gaussian quasi-likelihood estimation of ergodic diffusion process: see [24], also Section 4.2. Let K m ∈ N be a given number, which represents the number of the components having different convergence rates in M m , and assume that the mth-model parameter vector is divided into K m parts:Then the QMLE in the mth model takes the formθ m,n = (θ m,1,n , . . . ,θ m,Km,n ). The optimal value of θ m associated with H m,n , to be precisely defined later on, is denoted by θ m,0 = (θ m,1,0 , . . . , θ m,Km,0 ), θ m,k,0 ∈ Θ m,k . The rate matrix in the model M m is then given in the form A m,n (θ m,0 ) = diag a m,1,n (θ m,0 )I pm,1 , . . . , a m,Km,n (θ m,0 )I pm,K m , (2.1)where I p denotes the p-dimensional identity matrix and a m,k,n (θ m,0 ) is a deterministic sequence satisfying that a m,k,n (θ m,0 ) → 0, a m,i,n (θ m,0 )/a m,j,n...