Geological remote sensing interpretation plays a pivotal role in the field of regional geological mapping, encompassing the analysis of rock, soil, and water features. However, these geological elements can be obscured by the surrounding geographical environment and can undergo modifications caused by geological activities. The former hinders the effectiveness of satellite remote sensing data, resulting in the invisibility of element features, while the latter leads to the complex distribution of element features and significant spatial variations of geological elements. Consequently, existing deep learning-based models for interpreting geological elements often exhibit limited accuracy. To address these issues, this study proposes the Contextually Enhanced Multiscale Feature Fusion Network (CEMFFNet) for the efficient interpretation of geological elements. First, the context enhancement module is employed to extract abundant feature information and reinforce contextual features, aiming to capture essential features and strengthen their interconnections. Second, the multiscale feature fusion module incorporates the SimAM attention mechanism to adaptively learn features from different channels, emphasizing the feature information that contributes to interpretation results and maximizing the comprehensive and crucial feature information for each element. Extensive experiments demonstrate the superior performance of both the context enhancement module and the multiscale feature fusion module compared to several representative deep learning networks in terms of overall interpretation accuracy on two datasets. The model demonstrated improvements in oPA and mIoU of 2.4% and 2.8%, respectively, on the Landsat 8 dataset, and 3.5% and 3.2%, respectively, on the Sentinel-2 dataset.