This paper presents a novel method and innovative apparatus for building three‐dimensional (3D) dense visual maps of large‐scale unstructured environments for autonomous navigation and real‐time localization. The main contribution of the paper is focused on proposing an efficient and accurate 3D world representation that allows us to extend the boundaries of state‐of‐the‐art dense visual mapping to large scales. This is achieved via an omnidirectional key‐frame representation of the environment, which is able to synthesize photorealistic views of captured environments at arbitrary locations. Locally, the representation is image‐based (egocentric) and is composed of accurate augmented spherical panoramas combining photometric information (RGB), depth information (D), and saliency for all viewing directions at a particular point in space (i.e., a point in the light field). The spheres are related by a graph of six degree of freedom (DOF) poses (3 DOF translation and 3 DOF rotation) that are estimated through multiview spherical registration. It is shown that this world representation can be used to perform robust real‐time localization (in 6 DOF) of any configuration of visual sensors within their environment, whether they be monocular, stereo, or multiview. Contrary to feature‐based approaches, an efficient direct image registration technique is formulated. This approach directly exploits the advantages of the spherical representation by minimizing a photometric error between a current image and a reference sphere. Two novel multicamera acquisition systems have been developed and calibrated to acquire this information, and this paper reports for the first time the second system. Given the robustness and efficiency of this representation, field experiments demonstrating autonomous navigation and large‐scale mapping will be reported in detail for challenging unstructured environments, containing vegetation, pedestrians, varying illumination conditions, trams, and dense traffic.