Scene modeling has a key role in applications ranging from visual mapping to augmented reality. This paper presents an end-to-end solution for creating accurate three-dimensional (3D) textured models using monocular video sequences. The methods are developed within the framework of sequential structure from motion, in which a 3D model of the environment is maintained and updated as new visual information becomes available. The proposed approach contains contributions at different levels. The camera pose is recovered by directly associating the 3D scene model with local image observations, using a dual-registration approach. Compared to the standard structure from motion techniques, this approach decreases the error accumulation while increasing the robustness to scene occlusions and feature association failures, while allowing 3D reconstructions for any type of scene. Motivated by the need to map large areas, a novel 3D vertex selection mechanism is proposed, which takes into account the geometry of the scene. Vertices are selected not only to have high reconstruction accuracy but also to be representative of the local shape of the scene. This results in a reduction in the complexity of the final 3D model, with minimal loss of precision. is generated. We present a method for blending image textures using 3D geometric information and photometric differences between registered textures. The method allows high-quality mosaicing over 3D surfaces by reducing the effects of the distortions induced by camera viewpoint and illumination changes. The results are presented for four scene modeling scenarios, including a comparison with ground truth under a realistic scenario and a challenging underwater data set. Although developed primarily for underwater mapping applications, the methods are general and applicable to other domains, such as aerial and land-based mapping. C 2009 Wiley Periodicals, Inc.