“…At the same time, in order to help the system obtain occlusion target information faster, channel attention should be introduced before each step of pooling operation, to provide convolution information for all pixels, enhance the expression ability of useful information, and use multilevel channel attention to achieve feature fusion, so as to provide effective basis for global semantic information and local detail information, and finally obtain better segmentation results. [10][11][12][13] Based on the analysis of the network structure design shown in Table 1 below, when the RGB three-channel image with 256×256 pixels is input for downsampling operation, the downsampling operation is repeated according to the above rules after two layers of convolution operation with the convolution kernel size of 3×3 and step size of 1, batch normalization layer, attention module and maximum pooling layer, etc. Gradually increase the size of the feature map and reduce the depth of the feature layer, use the attention features contained in the downsampling and the same scale information of the up-sampling to achieve feature stitching, and repeat the up-sampling operation in accordance with the basic provisions, and finally get the 256×256×64 feature map.…”