“…The multi-scale features of the image are extracted without loss of information and combined with residual units make the network less susceptible to degradation. The encoding and decoding module extracts the global context information of the image, judges the category probability of each picture element according to the fused features, and inputs it into the classifier for pixel level cloud and non cloud segmentation [27] . Later, a large number of algorithms combining U-Net and attention mechanism appeared [24,72,76] .…”