In this paper we present a novel approach to global localization using an RGB-D camera in maps of visual features. For large maps, the performance of pure image matching techniques decays in terms of robustness and computational cost. Particularly, repeated occurrences of similar features due to repeating structure in the world (e.g., doorways, chairs, etc.) or missing associations between observations pose critical challenges to visual localization. We address these challenges using a two-step approach. We first estimate a candidate pose using few correspondences between features of the current camera frame and the feature map. The initial set of correspondences is established by proximity in feature space. The initial pose estimate is used in the second step to guide spatial matching of features in 3D, i.e., searching for associations where the image features are expected to be found in the map. A RANSAC algorithm is used to compute a fine estimation of the pose from the correspondences. Our approach clearly outperforms localization based on feature matching exclusively in feature space, both in terms of estimation accuracy and robustness to failure and allows for global localization in real time (30 Hz).