Land cover mapping provides spatial information on the physical properties of the Earth’s surface, for various classes of wetlands, artificial surface and constructions, vineyards, water bodies, etc. Having reliable information on land cover is crucial to developing solutions to a variety of environmental problems such as destruction of important wetlands/forests, and loss of fish and wildlife habitats. This has made land cover mapping one of the most widespread application areas in remote sensing computational imaging. However, due to the differences between modalities in terms of resolutions, content, and sensors, integrating complementary information that multi-modal remote sensing imagery exhibits into a robust and accurate system still remains challenging, and classical segmentation approaches generally do not give satisfactory results for land cover mapping. In this paper, we propose a novel dynamic deep network architecture, AMM-FuseNet, that promotes the use of multi-modal remote sensing images for the purpose of land cover mapping. The proposed network exploits the hybrid approach of the Channel Attention mechanism and Densely Connected Atrous Spatial Pyramid Pooling (DenseASPP). In the experimental analysis, in order to to verify the validity of the proposed method, we test AMM-FuseNet applied to four datasets whilst comparing it to the 6 state-of-the-art models of DeepLabV3+, PSPNet, UNet, SegNet, DenseASPP, and DANet. In addition, we also demonstrate the capability of AMM-FuseNet under minimal training supervision (reduced number of training samples) compared to the state-of-the-art, achieving less accuracy loss even for the case with 1/20 of the training samples.