This work presents a framework to create a visual model of the environment which can be used to estimate the position of a mobile robot by means of artificial intelligence techniques. The proposed framework retrieves the structure of the environment from a dataset composed of omnidirectional images captured along it. These images are described by means of global-appearance approaches. The information is arranged in two layers, with different levels of granularity. The first layer is obtained by means of classifiers and the second layer is composed of a set of data fitting neural networks. Subsequently, the model is used to estimate the position of the robot, in a hierarchical fashion, by comparing the image captured from the unknown position with the information in the model. Throughout this work, five classifiers are evaluated (Naïve Bayes, SVM, random forest, linear discriminant classifier and a classifier based on a shallow neural network) along with three different global-appearance descriptors (HOG, gist, and a descriptor calculated from an intermediate layer of a pre-trained CNN). The experiments have been tackled with some publicly available datasets of omnidirectional images captured indoors with the presence of dynamic changes. Several parameters are used to assess the efficiency of the proposal: the ability of the algorithm to estimate coarsely the position (hit ratio), the average error (cm) and the necessary computing time. The results prove the efficiency of the framework to model the environment and localize the robot from the knowledge extracted from a set of omnidirectional images with the proposed artificial intelligence techniques.