Abstract:In this paper, we address class incremental learning (IL) in remote sensing image analysis. Since remote sensing images are acquired continuously over time by Earth's Observation sensors, the land-cover/land-use classes on the ground are likely to be found in a gradational manner. This process restricts the deployment of stand-alone classification approaches, which are trained for all the classes together in one iteration. Therefore, for every new set of categories discovered, the entire network consisting of … Show more
“…In recent years, many benchmark datasets have been released, and these datasets have been researched with a high level of success, such as in [ 8 , 19 , 20 , 21 ]. With these datasets, a high level of average recall can be obtained through the careful selection of hyperparameters and augmentation schemes.…”
Too often, the testing and evaluation of object detection, as well as the classification techniques for high-resolution remote sensing imagery, are confined to clean, discretely partitioned datasets, i.e., the closed-world model. In recent years, the performance on a number of benchmark datasets has exceeded 99% when evaluated using cross-validation techniques. However, real-world remote sensing data are truly big data, which often exceed billions of pixels. Therefore, one of the greatest challenges regarding the evaluation of machine learning models taken out of the clean laboratory setting and into the real world is the difficulty of measuring performance. It is necessary to evaluate these models on a grander scale, namely, tens of thousands of square kilometers, where it is intractable to the ground truth and the ever-changing anthropogenic surface of Earth. The ultimate goal of computer vision model development for automated analysis and broad area search and discovery is to augment and assist humans, specifically human–machine teaming for real-world tasks. In this research, various models have been trained using object classes from benchmark datasets such as UC Merced, PatternNet, RESISC-45, and MDSv2. We detail techniques to scan broad swaths of the Earth with deep convolutional neural networks. We present algorithms for localizing object detection results, as well as a methodology for the evaluation of the results of broad-area scans. Our research explores the challenges of transitioning these models out of the training–validation laboratory setting and into the real-world application domain. We show a scalable approach to leverage state-of-the-art deep convolutional neural networks for the search, detection, and annotation of objects within large swaths of imagery, with the ultimate goal of providing a methodology for evaluating object detection machine learning models in real-world scenarios.
“…In recent years, many benchmark datasets have been released, and these datasets have been researched with a high level of success, such as in [ 8 , 19 , 20 , 21 ]. With these datasets, a high level of average recall can be obtained through the careful selection of hyperparameters and augmentation schemes.…”
Too often, the testing and evaluation of object detection, as well as the classification techniques for high-resolution remote sensing imagery, are confined to clean, discretely partitioned datasets, i.e., the closed-world model. In recent years, the performance on a number of benchmark datasets has exceeded 99% when evaluated using cross-validation techniques. However, real-world remote sensing data are truly big data, which often exceed billions of pixels. Therefore, one of the greatest challenges regarding the evaluation of machine learning models taken out of the clean laboratory setting and into the real world is the difficulty of measuring performance. It is necessary to evaluate these models on a grander scale, namely, tens of thousands of square kilometers, where it is intractable to the ground truth and the ever-changing anthropogenic surface of Earth. The ultimate goal of computer vision model development for automated analysis and broad area search and discovery is to augment and assist humans, specifically human–machine teaming for real-world tasks. In this research, various models have been trained using object classes from benchmark datasets such as UC Merced, PatternNet, RESISC-45, and MDSv2. We detail techniques to scan broad swaths of the Earth with deep convolutional neural networks. We present algorithms for localizing object detection results, as well as a methodology for the evaluation of the results of broad-area scans. Our research explores the challenges of transitioning these models out of the training–validation laboratory setting and into the real-world application domain. We show a scalable approach to leverage state-of-the-art deep convolutional neural networks for the search, detection, and annotation of objects within large swaths of imagery, with the ultimate goal of providing a methodology for evaluating object detection machine learning models in real-world scenarios.
“…Continual learning benchmark for remote sensing [64] is a large-scale remote sensing image scene classification database based on three CL scenarios. CILEA-Net [65] proposes a CL strategy, based on the incremental learning of new classes ordered according to the similarity with the old ones. In [66], an incremental learning with open-set recognition framework and a new loss are proposed for RS image scene classification.…”
In the field of earth observation (EO), continual learning (CL) algorithms have been proposed to deal with large datasets by decomposing them into several subsets and processing them incrementally. The majority of these algorithms assume that data are, first, coming from a single source, and second, fully labeled. Real-world EO datasets are instead characterized by a large heterogeneity (e.g., coming from aerial, satellite, or drone scenarios), and for the most part they are unlabeled, meaning they can be fully exploited only through the emerging self-supervised learning (SSL) paradigm. For these reasons, in this article, we present a new algorithm for merging SSL and CL for remote sensing applications that we call continual Barlow twins. It combines the advantages of one of the simplest self-supervision techniques, i.e., Barlow twins, with the elastic weight consolidation method to avoid catastrophic forgetting. In addition, we evaluate the proposed continual SSL approach on a highly heterogeneous EO dataset, showing the effectiveness of this strategy on a novel combination of three almost non-overlapping domains datasets (airborne Potsdam, satellite US3D, and drone unmanned aerial vehicle semantic segmentation dataset), on a crucial downstream task in EO, i.e., semantic segmentation. Encouraging results show the superiority of SSL in this setting, and the effectiveness of creating an incremental effective pretrained feature extractor, based on ResNet50, without the need of relying on the complete availability of all the data, with a valuable saving of time and resources.
“…Incremental learning is a technique that allows a neural network to continuously update its parameters with incremental data, breaking the traditional one-off training process in deep learning. IL has been explored in various fields, including computer vision [7], [8], natural language processing [9], [10] and remote sensing [11].The most significant challenge in IL is catastrophic forgetting, which occurs when the parameter updates result in the loss of previously learned knowledge. This phenomenon was first identified and discussed as early as the 1980s by McCloskey, et al [12].…”
Continual semantic segmentation (CSS) based on incremental learning (IL) is a great endeavour in developing human-like segmentation models. However, current CSS approaches encounter challenges in the trade-off between preserving old knowledge and learning new ones, where they still need large-scale annotated data for incremental training and lack interpretability. In this paper, we present Learning at a Glance (LAG), an efficient, robust, human-like and interpretable approach for CSS. Specifically, LAG is a simple and model-agnostic architecture, yet it achieves competitive CSS efficiency with limited incremental data. Inspired by human-like recognition patterns, we propose a semantic-invariance modelling approach via semantic features decoupling that simultaneously reconciles solid knowledge inheritance and new-term learning. Concretely, the proposed decoupling manner includes two ways, i.e., channel-wise decoupling and spatial-level neuron-relevant semantic consistency. Our approach preserves semantic-invariant knowledge as solid prototypes to alleviate catastrophic forgetting, while also constraining sample-specific contents through an asymmetric contrastive learning method to enhance model robustness during IL steps. Experimental results in multiple datasets validate the effectiveness of the proposed method. Furthermore, we introduce a novel CSS protocol that better reflects realistic data-limited CSS settings, and LAG achieves superior performance under multiple data-limited conditions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.