In spacecraft rendezvous and docking, traditional methods that rely on inertial navigation and sensor data face challenges due to sensor inaccuracies, noise, and a lack of multi-approach assurance. Focusing on exploring a new approach as assistance, this study marks the first application of deep learning-based image feature matching in spacecraft docking tasks, introducing the Class-Tuned Invariant Feature Transformer (CtIFT) algorithm. CtIFT incorporates an improved cross-attention mechanism and a custom-designed feature classification module. By using symmetric multi-layer cross-attention, it gradually strengthens inter-feature relationships perception. And, in the feature matcher, it employs feature classification to reduce computational load, thereby achieving high-precision matching. The model is trained on multi-source datasets to enhance its adaptability in complex environments. The method demonstrates outstanding performance across experiments on four spacecraft docking video scenes, with CtIFT being the only feasible solution compared to SIFT and eight state-of-the-art network methods: D2-Net, SuperPoint, SuperGlue, LightGlue, ALIKED, LoFTR, ASpanFormer, and TopicFM+. The number of successfully matched feature points per frame consistently reaches the hundreds, the successful rate remains 100%, and the average processing time is maintained below 0.18 s per frame, an overall performance which far exceeds other methods. The results indicate that this approach achieves strong matching accuracy and robustness in optical docking imaging, supports real-time processing, and provides new technical support for assistance of spacecraft rendezvous and docking tasks.