A common problem in video-based tracking of urban targets is occlusion due to buildings and vehicles. Fortunately, when multiple video sensors are present with enough geometric diversity, track breaks due to temporary occlusion can be substantially reduced by correlating and fusing source-level track data into system-level tracks. Furthermore, when operating in a communication-constrained environment, it is preferable to transmit track data rather than either raw video data or detection measurements. To avoid statistical correlation due to common prior information, tracklets can be formed from the source tracks prior to transmission to a central command node, which is then responsible for system track maintenance via correlation and fusion. To maximize the operational benefit of the system-level track picture, it should be distributed in an efficient manner to all platforms, especially the local trackers at the sensors. In this paper, we describe a centralized architecture for multi-sensor video tracking that uses tracklet-based feedback to maintain an accurate and complete track picture at all platforms. We will also use challenging synthetic video data to demonstrate that our architecture improves track completeness, enhances track continuity (in the presence of occlusions), and reduces track initiation time at the local trackers.