In biological vision systems, attention mechanisms are responsible for selecting the relevant information from the sensed field of view, so that the complete scene can be analyzed using a sequence of rapid eye saccades. In recent years, efforts have been made to imitate such attention behavior in artificial vision systems, because it allows optimizing the computational resources as they can be focused on the processing of a set of selected regions. In the framework of mobile robotics navigation, this work proposes an artificial model where attention is deployed at the level of objects (visual landmarks) and where new processes for estimating bottom-up and top-down (target-based) saliency maps are employed. Bottom-up attention is implemented through a hierarchical process, whose final result is the perceptual grouping of the image content. The hierarchical grouping is applied using a Combinatorial Pyramid that represents each level of the hierarchy by a combinatorial map. The process takes into account both image regions (faces in the map) and edges (arcs in the map). Top-down attention searches for previously detected landmarks, enabling their re-detection when the robot presumes that it is revisiting a known location. Landmarks are described by a combinatorial submap; thus, this search is conducted through an error-tolerant submap isomorphism procedure.