Recently, several land cover classification models have achieved great success in terms of both accuracy and computational performance. However, it remains challenging due to inter-class similarities, intra-class variations, scale-related inaccuracies, and high computational complexity. First, these methods fail to establish a correlation among different feature maps during multiscale feature extraction, leading to inter-class similarities and intra-class variations. Second, they under-utilize feature interdependencies of the context contained in each layer of the encoder-decoder architecture, causing scale-related inaccuracies. Third, they cause checkerboard artifacts and blurry edges, which can negatively impact the accuracy and generated segmentation map at increased computational cost. To address these problems, this article proposes a novel multiscale contextaware feature fusion network (MCN) for high-resolution urban scene images. MCN mainly consists of three modules: 1) multiscale feature enhancement module (MFE) for backbone network to extract rich spatial information dynamically by incorporating dense correlation among feature maps with different receptive fields, 2) multilayer feature fusion module (MLF) as skip connections to produce a single high-level representation of the local-global context by capturing low-, mid-, high-level interdependencies at different encoder-decoder stages, and 3) pixel shuffle decoder (PSD) to reduce the blurry edges and checkerboard artifacts while upsampling with reduced number of parameters. Experiments on three high-resolution aerial and satellite urban scene datasets show that MCN consistently outperforms the mainstream land cover classification models. Specifically, MCN achieves an OA of 93.51 on Potsdam, 90.18 on Vaihingen, and a mIoU of 73.73 on DeepGlobe.