The machine learning approach has shown its state-of-the-art ability to handle segmentation and detection tasks. It is increasingly employed to extract patterns and spatiotemporal features from the ever-increasing stream of Earth system data. However, there is still a significant challenge, which is the generalization capability of the model on cloud images in different types and weather conditions. After studying several popular methods, we propose a semantic segmentation neural network for cloud segmentation. It extracts features learned by source and target domains in an end-to-end behavior, which can address the problem of significant lack of labels in the observed cloud image data. It is further evaluated on the Singapore Whole Sky Image Segmentation (SWIMSEG) dataset by using Mean Intersection-over-Union, recall, F-score, and accuracy matrices. The scores of these matrices are 86%, 97%, 92%, and 96%, which prove that it has excellent efficiency and robustness. Most importantly, a new benchmark based on the SWIMSEG dataset for the task of cloud segmentation is introduced. The others, BENCHMARK, Cirrus Cumulus Stratus Nimbus are evaluated through the model trained from the SWIMSEG dataset by way of visualization.
Plain Language Summary The machine learning approaches offer a new view about howto effectively and comprehensively understand ground-based cloud datasets. The essential advantage of deep learning methods is that it can extract more critical cloud features automatically than traditional algorithms, such as spatiotemporal features. Therefore, it is worth exploring the possibility of cloud segmentation with the help of deep learning techniques. We first introduce a semantic segmentation neural network for cloud segmentation problems after measuring a few classic neural networks. The results exceed the traditional methods by a large margin with standard evaluation matrices, such as Mean Intersection-over-Union, recall, F-score, and accuracy. The scores achieved here may accomplish as a baseline for competitive development. Second, the trained model is used to produce a few cloud masks in two public datasets: BENCHMARK, Singapore Whole Sky Image Segmentation, and their respective performance is further evaluated. Finally, the segmentation results show the excellent performance and generalization in another untrained dataset, Cirrus Cumulus Stratus Nimbus.