Abstract. This paper addresses the problem of multi-frame, multi-target video tracking. Unlike recent approaches that use only unary and pairwise costs, we propose a solution based on three-frame tracklets to leverage constant-velocity motion constraints while keeping computation time low. Tracklets are solved for within a sliding window of frame triplets, each having a two frame overlap with neighboring triplets. Any inconsistencies in these local tracklet solutions are resolved by considering a larger temporal window, and the remaining tracklets are then merged globally using a min-cost network flow formulation. The result is a set of high-quality trajectories capable of spanning gaps caused by missed detections and long-term occlusions. Our experimental results show good performance in complex scenes.