In this paper, we aim to explore the potential of using onboard cameras and pre-stored geo-referenced imagery for Unmanned Aerial Vehicle (UAV) localization. Such a vision-based localization enhancing system is of vital importance, particularly in situations where the integrity of the global positioning system (GPS) is in question (i.e., in the occurrence of GPS outages, jamming, etc.). To this end, we propose a complete trainable pipeline to localize an aerial image in a pre-stored orthomosaic map in the context of UAV localization. The proposed deep architecture extracts the features from the aerial imagery and localizes it in a pre-ordained, larger, and geotagged image. The idea is to train a deep learning model to find neighborhood consensus patterns that encapsulate the local patterns in the neighborhood of the established dense feature correspondences by introducing semilocal constraints. We qualitatively and quantitatively evaluate the performance of our approach on real UAV imagery. The training and testing data is acquired via multiple flights over different regions. The source code along with the entire dataset, including the annotations of the collected images has been made public 1. Up-to our knowledge, such a dataset is novel and first of its kind which consists of 2052 high-resolution aerial images acquired at different times over three different areas in Pakistan spanning a total area of around 2 km 2 .