Space situational awareness (SSA) system requires recognition of space objects that are varied in sizes, shapes, and types. The space images are challenging because of several factors such as illumination and noise and thus make the recognition task complex. Image fusion is an important area in image processing for various applications including RGB-D sensor fusion, remote sensing, medical diagnostics, and infrared and visible image fusion. Recently, various image fusion algorithms have been developed and they showed a superior performance to explore more information that are not available in single images. In this paper, we compared various methods of RGB and Depth image fusion for space object classification task. The experiments were carried out, and the performance was evaluated using 13 fusion performance metrics. It was found that the guided filter context enhancement (GFCE) outperformed other image fusion methods in terms of average gradient (8.2593), spatial frequency (28.4114), and entropy (6.9486). additionally, due to its ability to balance between good performance and inference speed (11.41 second), GFCE was selected for RGB and Depth image fusion stage before feature extraction and classification stage. The outcome of fusion method is fused images that were used to train a deep ensemble of CoAtNets to classify space objects into ten categories. The deep ensemble learning methods including bagging, boosting, and stacking were trained and evaluated for classification purposes. It was found that combination of fusion and stacking was able to improve classification accuracy largely compared to the baseline methods by producing an average accuracy of 89 % and average F1 score of 89 %.