“…Single target visual tracking has long been attracting large amounts of research efforts [39]. It is impractical to enumerate all previous work, instead we sample some recent interests related to our work: i) linear representation with a dictionary, e.g., a set of basis vectors based on subspace learning [29,12] or least softthreshold squares linear regression [32], a series of raw pixel templates based on sparse coding [25,24,44,43,36] or non-sparse linear representation [22]; ii) collaboration of multiple tracking models, e.g., Interacting Markov Chain Monto Carlo (MCMC) based [17,18,19], local/global combination based [45]; iii) part-based models, e.g., fragments voting based [1,9,5], incorporating spatial constraints between the parts [42,37], alignment-pooling across the local patches [14]; iv) and the widely followed trackingby-detection (or discriminative) methods [6,7,20,2,8,21,31,45], which treat the tracking problem as a classification task. All these trackers adaptively update tracking models to accommodate the appearance changes and new information during tracking.…”