Vision-based robot localization outdoors has remained more elusive than its indoors counterpart. Drastic illumination changes and the scarceness of suitable landmarks are the main difficulties. This paper attempts to surmount them by deviating from the main trend of using local features. Instead, a global descriptor called landmark-view is defined, which aggregates the most visually-salient landmarks present in each scene. Thus, landmark co-occurrence and spatial and saliency relationships between them are added to the single landmark characterization, based on saliency and color distribution. A suitable framework to compare landmark-views is developed, and it is shown how this remarkably enhances the recognition performance, compared against single landmark recognition. A view-matching model is constructed using logistic regression. Experimentation using 45 views, acquired outdoors, containing 273 landmarks, yielded good recognition results. The overall percentage of correct view classification obtained was 80.6%, indicating the adequacy of the approach.Keywords: visual landmarks, visual saliency, robot navigation, autonomous robot
IntroductionThe extraction of reliable visual landmarks for mobile robot localization in unknown outdoor unstructured environments is still an open research problem. One of the key factors that makes the detection and recognition of visual landmarks in outdoor environments, as well as indoors without dominant artificial illumination, a challenging task is that acquired visual information is strongly dependent on lighting geometry (direction and intensity of light source) and illuminant color (spectral power distribution), which change with sun position and atmospheric conditions.Most feature extraction approaches are not adequate for this type of environments, since they rely on either structured information from non-deformable objects [7], or on a priori knowledge about the landmarks [3]. Several recent works achieved interesting results using SIFT features to match pairs of images [18,22,32], which can be extended to the landmark recognition problem. Since mobile robot navigation tasks require real-time execution, some efforts have been made to reduce the considerable computational cost necessary to evaluate SIFT features for a whole image [17,28]. Also it was reported that SIFT features fail to consider global context to resolve ambiguities that can occur locally in images, motivating solutions that improve the amount of global information used in the descriptors [10,24] A suitable framework to compare views is developed, and it is shown how this remarkably enhances the recognition performance.The remainder of the paper is organized as follows. Section 2 presents the concept of landmark and visual saliency. In this context, landmarks are the visual elements that are used to provide the features for the recognition of places, and they are found or defined based on classical visual saliency criteria, inspired in a biological model of visual opponency. Section 3 describes how to ref...