Visual map-based robot navigation is a strategy that only uses the robot vision system, involving four fundamental stages: learning or mapping, localization, planning, and navigation. Therefore, it is paramount to model the environment optimally to perform the aforementioned stages. In this paper, we propose a novel framework to generate a visual map for environments both indoors and outdoors. The visual map comprises key images sharing visual information between consecutive key images. This learning stage employs a pre-trained local feature transformer (LoFTR) constrained with a 3D projective transformation (a fundamental matrix) between two consecutive key images. Outliers are efficiently detected using marginalizing sample consensus (MAGSAC) while estimating the fundamental matrix. We conducted extensive experiments to validate our approach in six different datasets and compare its performance against hand-crafted methods.