We present an algorithm for fusing data from a constellation of RF sensors detecting cellular emanations with the output of a multi-spectral video tracker to localize and track a target with a specific cell phone. The RF sensors measure the Doppler shift caused by the moving cellular emanation and then Doppler differentials between all sensor pairs are calculated. The multi-spectral video tracker uses a Gaussian mixture model to detect foreground targets and SIFT features to track targets through the video sequence. The data is fused by associating the Doppler differential from the RF sensors with the theoretical Doppler differential computed from the multi-spectral tracker output. The absolute difference and the root-mean-square difference are computed to associate the Doppler differentials from the two sensor systems. Performance of the algorithm was evaluated using synthetically generated datasets of an urban scene with multiple moving vehicles. The presented fusion algorithm correctly associates the cellular emanation with the corresponding video target for low measurement uncertainty and in the presence of favorable motion patterns. For nearly all objects the fusion algorithm has high confidence in associating the emanation with the correct multi-spectral target from the most probable background target.