Recent studies on visual tracking have shown significant improvement in accuracy by handling the appearance variations of the target object. Whereas most studies present schemes to extract the time-invariant characteristics of the target and adaptively update the appearance model, the present paper concentrates on modeling the probabilistic dependency between sequential target appearances ( Fig. 1-(a)). To actualize this interest, a new Bayesian tracking framework is formulated under the autoregressive Hidden Markov Model (AR-HMM), where the probabilistic dependency between sequential target appearances is implied. During the learning phase at each time step, the proposed tracker separates formerly seen target samples into several clusters based on their visual similarity, and learns clusterspecific classifiers as multiple appearance models, each of which represents a certain type of the target appearance. Then the dependency between these appearance models is learned. During the searching phase, the target state is estimated by inferring the most probable appearance model under the consideration of its dependency on formerly utilized appearance models. The proposed method is tested on 12 challenging video sequences containing targets with abrupt appearance variations, and demonstrates that it outperforms current state-of-the-art methods in accuracy.