In recent years, remote sensing images of various types have found widespread applications in resource exploration, environmental protection, and land cover classification. However, relying solely on a single optical or synthetic aperture radar (SAR) image as the data source for land cover classification studies may not suffice to achieve the desired accuracy in ground information monitoring. One widely employed neural network for remote sensing image land cover classification is the U-Net network, which is a classical semantic segmentation network. Nonetheless, the U-Net network has limitations such as poor classification accuracy, misclassification and omission of small-area terrains, and a large number of network parameters. To address these challenges, this research paper proposes an improved approach that combines both optical and SAR images in bands for land cover classification and enhances the U-Net network. The approach incorporates several modifications to the network architecture. Firstly, the encoder-decoder framework serves as the backbone terrain-extraction network. Additionally, a convolutional block attention mechanism is introduced in the terrain extraction stage. Instead of pooling layers, convolutions with a step size of 2 are utilized, and the Leaky ReLU function is employed as the network's activation function. This design offers several advantages: it enhances the network's ability to capture terrain characteristics from both spatial and channel dimensions, resolves the loss of terrain map information while reducing network parameters, and ensures non-zero gradients during the training process. The effectiveness of the proposed method is evaluated through land cover classification experiments conducted on optical, SAR, and combined optical and SAR datasets. The results demonstrate that our method achieves classification accuracies of 0.8905, 0.8609, and 0.908 on the three datasets, respectively, with corresponding mIoU values of 0.8104, 0.7804, and 0.8667. Compared to the traditional U-Net network, our method exhibits improvements in both classification accuracy and mIoU to a certain extent.