The semantic segmentation task is a basic task in the field of Mobile
Edge Computing, which requires the classification of each pixel in the
image, which has higher requirements for classification accuracy than
the image classification task. Fine-grained classification tasks
requires more detailed information, in addition to classifying according
to the semantic information and spatial information of each pixel unit
and the surrounding pixels, it is also necessary to distinguish from
adjacent pixels, which is one of the main difficulties of the current
segmentation task. However, high-resolution input images can bring more
detailed information, but they are often accompanied by expensive
computing costs, so smaller resolution images will be put in practical
applications to ensure computing speed. As another task of computer
vision, super-resolution recovery focuses on extracting information from
low-resolution pictures and reasoning into higher-resolution feature
maps. Its recovered detail features contribute to the high-precision
classification of semantic segmentation tasks. Considering the
complementarity of the two tasks, considering the use of transformer as
a feature extractor, the design algorithm realizes semantic segmentation
and super-resolution recovery tasks at the same time, multi-task
learning can ensure that the backbone network obtains more common
high-dimensional information, and then we use the results of
super-resolution recovery branches to guide the semantic segmentation
task to provide more detailed information and finally obtain an
effective improvement on the original baseline.