In this study, the authors propose an object proposal algorithm that can accurately propose object candidate regions at each frame, despite noise in a video. Accordingly, they define three orthogonal planes, namely vertical-horizontal, temporalvertical, and temporal-horizontal planes. As these planes are orthogonal, they are the most compact planes that can span the spatiotemporal space of a video. Their algorithm selects good object proposals for the vertical-horizontal plane with the help of the object proposal results of the other planes. Experimental results demonstrate that the proposed algorithm produces better object proposals than the baseline algorithm and other state-of-the-art methods. In particular, their method provides more accurate object proposals in challenging environments with severe noise and background clutter. In addition, the object proposal results are utilised for visual tracking problems, and the experimental results show that their visual tracker outperforms recent deep-learning-based trackers.