We propose a novel 6D pose estimation approach tailored for auto-landing fixed-wing unmanned aerial vehicles (UAVs). This method facilitates the simultaneous tracking of both position and attitude using a ground-based vision system, regardless of the number of cameras (N-cameras), even in Global Navigation Satellite System-denied environments. Our approach proposes a pipeline consisting of a Convolutional Neural Network (CNN)-based detection of UAV anchors which, in turn, drives the estimation of UAV pose. In order to ensure robust and precise anchor detection, we designed a Block-CNN architecture to mitigate the influence of outliers. Leveraging the information from these anchors, we established an Extended Kalman Filter to continuously update the UAV’s position and attitude. To support our research, we set up both monocular and stereo outdoor ground view systems for data collection and experimentation. Additionally, to expand our training dataset without requiring extra outdoor experiments, we created a parallel system that combines outdoor and simulated setups with identical configurations. We conducted a series of simulated and outdoor experiments. The results show that, compared with the baselines, our method achieves 3.0% anchor detection precision improvement and 19.5% and 12.7% accuracy improvement of position and attitude estimation. Furthermore, these experiments affirm the practicality of our proposed architecture and algorithm, meeting the stringent requirements for accuracy and real-time capability in the context of auto-landing fixed-wing UAVs.