Multisource data fusion technology integrates the strengths of various data sources, addressing the limitations of relying on a single source. Therefore, it has been widely applied in fields such as lithological classification and mineral exploration. However, traditional deep learning algorithms fail to distinguish the importance of different features effectively during fusion, leading to insufficient focus in the model. To address this issue, this paper introduces a ResHA network based on a hybrid attention mechanism to fuse features from ASTER remote sensing images, geochemical data, and DEM data. A case study was conducted in the Altay Orogenic Belt to demonstrate the lithological classification process. This study explored the impact of the submodule order on the hybrid attention mechanism and compared the results with those of MLP, KNN, RF, and SVM models. The experimental results show that (1) the ResHA network with hybrid attention mechanisms assigned reasonable weights to the feature sets, allowing the model to focus on key features closely related to the task. This resulted in a 7.99% improvement in classification accuracy compared with that of traditional models, significantly increasing the precision of lithological classification. (2) The combination of channel attention followed by spatial attention achieved the highest overall accuracy, 98.06%.