“…Subsequently, numerous researchers (Adhikari et al, 2019;Badrinarayanan et al, 2017;Kolhar and Jagtap, 2021;Milioto et al, 2018;Peng et al, 2019) used convolutional encoder-decoder networks for semantic segmentation. The Visual Geometry Group (VGG) (Simonyan and Zisserman, 2015), ResNet (He et al, 2015), and InceptionV3 (Szegedy et al, 2015) networks achieve top-5 accuracy of 92.7%, 93.3%, and 93.9%, respectively on ImageNet dataset (Deng et al, 2009), which proves that these networks have good features extraction ability and often (Gao et al, 2020;Hecht et al, 2020;Majeed et al, 2018;Ou et al, 2019;Panda et al, 2022;Shah et al, 2022;Zou et al, 2021) used as a backbone in various CNN architectures developed for semantic segmentation. Therefore, the performance of the proposed models in this study was compared to the above-mentioned models while having these networks as an encoder backbone.…”