Localization and perception play an important role as the basis of autonomous Unmanned Aerial Vehicle (UAV) applications, providing the internal state of movements and the external understanding of environments. Simultaneous Localization And Mapping (SLAM), one of the critical techniques for localization and perception, is facing technical upgrading, due to the development of embedded hardware, multi-sensor technology, and artificial intelligence. This survey aims at the development of visual SLAM and the basis of UAV applications. The solutions to critical problems for visual SLAM are shown by reviewing state-of-the-art and newly presented algorithms, providing the research progression and direction in three essential aspects: real-time performance, texture-less environments, and dynamic environments. Visual–inertial fusion and learning-based enhancement are discussed for UAV localization and perception to illustrate their role in UAV applications. Subsequently, the trend of UAV localization and perception is shown. The algorithm components, camera configuration, and data processing methods are also introduced to give comprehensive preliminaries. In this paper, we provide coverage of visual SLAM and its related technologies over the past decade, with a specific focus on their applications in autonomous UAV applications. We summarize the current research, reveal potential problems, and outline future trends from academic and engineering perspectives.