Abstract-A challenging problem in image content extraction and classification is building a system that automatically learns high-level semantic interpretations of images. We describe a Bayesian framework for a visual grammar that aims to reduce the gap between low-level features and high-level user semantics. Our approach includes modeling image pixels using automatic fusion of their spectral, textural, and other ancillary attributes; segmentation of image regions using an iterative split-and-merge algorithm; and representing scenes by decomposing them into prototype regions and modeling the interactions between these regions in terms of their spatial relationships. Naive Bayes classifiers are used in the learning of models for region segmentation and classification using positive and negative examples for user-defined semantic land cover labels. The system also automatically learns representative region groups that can distinguish different scenes and builds visual grammar models. Experiments using Landsat scenes show that the visual grammar enables creation of high-level classes that cannot be modeled by individual pixels or regions. Furthermore, learning of the classifiers requires only a few training examples.
Although Worldview-2 (WV) images (non-pansharpened) have 2-m resolution, the re-visit times for the same areas may be seven days or more. In contrast, Planet images are collected using small satellites that can cover the whole Earth almost daily. However, the resolution of Planet images is 3.125 m. It would be ideal to fuse these two satellites images to generate high spatial resolution (2 m) and high temporal resolution (1 or 2 days) images for applications such as damage assessment, border monitoring, etc. that require quick decisions. In this paper, we evaluate three approaches to fusing Worldview (WV) and Planet images. These approaches are known as Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM), Flexible Spatiotemporal Data Fusion (FSDAF), and Hybrid Color Mapping (HCM), which have been applied to the fusion of MODIS and Landsat images in recent years. Experimental results using actual Planet and Worldview images demonstrated that the three aforementioned approaches have comparable performance and can all generate high quality prediction images.
Automatic content extraction, classification and content-based retrieval are highly desired goals in intelligent remote sensing databases. Pixel level processing has been the common choice for both academic and commercial systems. We extend the modeling of remotely sensed imagery to three levels: Pixel level, region level and scene level. Pixel level features are generated using unsupervised clustering of spectral values, texture features and ancillary data like digital elevation models. Region level features include shape information and statistics of pixel level feature values. Scene level features include statistics and spatial relationships of regions. This chapter describes our work on developing a probabilistic visual grammar to reduce the gap between low-level features and high-level user semantics, and to support complex query scenarios that consist of many regions with different feature characteristics. The visual grammar includes automatic identification of region prototypes and modeling of their spatial relationships. The system learns the prototype regions in an image collection using unsupervised clustering. Spatial relationships are represented by fuzzy membership functions. The system automatically selects significant relationships from training data and builds visual grammar models which can also be updated using user relevance feedback. A Bayesian framework is used to automatically classify scenes based on these models. We demonstrate our system with query scenarios that cannot be expressed by traditional region or scene level approaches but where the visual grammar provides accurate classifications and effective retrieval.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.