Target tracking technology that is based on aerial videos is widely used in many fields; however, this technology has challenges, such as image jitter, target blur, high data dimensionality, and large changes in the target scale. In this paper, the research status of aerial video tracking and the characteristics, background complexity and tracking diversity of aerial video targets are summarized. Based on the findings, the key technologies that are related to tracking are elaborated according to the target type, number of targets and applicable scene system. The tracking algorithms are classified according to the type of target, and the target tracking algorithms that are based on deep learning are classified according to the network structure. Commonly used aerial photography datasets are described, and the accuracies of commonly used target tracking methods are evaluated in an aerial photography dataset, namely, UAV123, and a long-video dataset, namely, UAV20L. Potential problems are discussed, and possible future research directions and corresponding development trends in this field are analyzed and summarized.