“…To address this problem, Cai et al [1] model the superpixels together with their relationships as an undirected graph, and formulate the target tracking as a graph matching problem. Xie et al [19] employ a minimum spanning tree to model the geometric structure constraint between superpixels, and infer the target state by voting from confident parts of the target based on appearance similarity and structural constraint. In addition to leveraging the spatial relationships between local parts, Li et al [5] construct a relational hypergraph, which models the high-order relationships among multiple local parts across the temporal domain, and identify temporally coherent parts for the target representation.…”