With the development of scientific information technology and the popularization of electronic devices, images and videos have become very important forms of information expression and carriers in our current lives. Accelerating the mining of valuable information content from massive data has become a very important aspect of current computer vision research. The saliency object detection method, which is related to human visual attention, is gradually being applied in computer processing. However, in current color depth models, the association mining of data depth clues is still far from sufficient, and there is still significant room for improvement in image quality. Based on this, an improved color depth detection model is proposed for information guided and multi feature fusion, and an absorption Markov model is introduced to optimize the guidance of low-level, middle-level, and high-level saliency maps, grasping different feature information contents. Subsequently, the gradual guidance of the network is achieved from aspects such as feature encoding, multi-scale and multi attention models, and attention refinement mechanisms. The experimental analysis of the fusion model proposed in the study showed that the average classification improvement accuracy of the fusion model reached 5.23%, and its error value was less than 0.1. The effectiveness on all four quantitative indicators exceeded 92%. The system's detection response rate exceeded 93%, which is limited by the target object and results in a decrease in accuracy. This algorithm can provide reference value and means for target localization recognition and virtual scene detection.