Surveillance and ground target tracking using multiple electro-optical and infrared video sensors onboard unmanned aerial vehicles (UAVs) has drawn a great deal of interest in recent years. We compare a number of track-totrack fusion algorithms using a single target with the nearly constant velocity dynamic model and two UAVs. A local tracker is associated with each UAV and processes video measurements to produce local tracks. The video measurement is the centroid pixel location in the digital image corresponding to the target positions on the ground. In order to handle arbitrary height variations, we use the perspective transformation for the video measurement model. In addition, the video measurement model also includes radial and tangential lens distortions, scale, and offset. Since the video measurement model is a nonlinear function of the target position, the tracking filter uses a nonlinear filtering algorithm. A fusion center fuses track data received from two local trackers. The track-to-track fusion algorithms employed by the fusion center include the simple convex combination fusion, Bhattacharya fusion, Bar-Shalom-Campo fusion, and extended information filter based fusion algorithms. We compare the fusion accuracy, covariance consistency, bias in the fused estimate, communication load requirements, and scalability. Numerical results are presented using simulated data.