Abstract-In order to derive the three-dimensional camera position from the monocular camera vision, a georeference database is needed. Floor plan is a ubiquitous georeference database that every building refers to it during construction and facility maintenance. Comparing with other popular geo-reference database such as geo-tagged photos, the generation, update and maintenance of floor plan database does not require costly and time consuming survey tasks. In vision based methods, the camera needs special attention. In contrast to other sensors, vision sensors typically yield vast information that needs complex strategies to permit use in real-time and on computationally con-strained platforms. This research work show that map-based visual odometer strategy derived from a state-of-the-art structure-from-motion framework is particularly suitable for locally stable, pose controlled flight. Issues concerning drifts and robustness are analyzed and discussed with respect to the original framework. Additionally, various usage of localization algorithm in view of vision has been proposed here. Though, a noteworthy downside with vision-based algorithms is the absence of robustness. The greater parts of the methodologies are delicate to scene varieties (like season or environment changes) because of the way that they utilize the Sum of Squared Differences (SSD). To stop that, we utilize the Mutual Information which is exceptionally vigorous toward global and local scene varieties. On the other hand, dense methodologies are frequently identified with drift drawbacks. Here, attempt to take care of this issue by utilizing geo-referenced pictures. The algorithm of localization has been executed and experimental results are available. Vision sensors possess the potential to extract information about the surrounding environment and determine the locations of features or points of interest. Having mapped out landmarks in an unknown environment, subsequent observations by the vision sensor can in turn be used to resolve position and orientation while continuing to map out new features. In addition, the experimental results of the proposed model also suggest a plausibility proof for feed forward models of delineate recognition in GEO-location.