“…Deep learning models are successfully applied in different computer vision and remote sensing tasks such as object detection (Wu et al, 2020;Zhao et al, 2019), image segmentation (Ghosh et al, 2019;Wang et al, 2019), human activity monitoring (Toshev and Szegedy, 2014;Zheng et al, 2019), object tracking (Ciaparrone et al, 2020;Zhai et al, 2018) and also the semantic segmentation. Semantic segmentation is the essential input for plenty of applications in computer vision and remote sensing, including scene understanding for autonomous driving (Siam et al, 2018), augmented reality (Ko and Lee, 2020), and different environmental monitoring applications such as precision agriculture (Anand et al, 2021), change detection (Venugopal, 2020), and urban mapping and monitoring (Du et al, 2021). In urban remote sensing, discriminating different elements of a city, including different kinds of buildings, paved areas, water bodies, trees and grasslands, cars and clutter are challenging due to variations in shapes, structures, textures, and colours differences (Diakogiannis et al, 2020).…”