Vision-based approaches for mobile indoor localization do not rely on the infrastructure and are therefore scalable and cheap. The particular requirements to a navigation user interface for a vision-based system, however, have not been investigated so far.Such interfaces should adapt to localization accuracy, which strongly relies on distinctive reference images, and other factors, such as the phone's pose. If necessary, the system should motivate the user to point at distinctive regions with the smartphone to improve localization quality.We present a combined interface of Virtual Reality (VR) and Augmented Reality (AR) elements with indicators that communicate and ensure localization accuracy. In an evaluation with 81 participants, we found that AR was preferred in case of reliable localization, but with VR, navigation instructions were perceived more accurate in case of localization and orientation errors. The additional indicators showed a potential for making users choose distinctive reference images for reliable localization.