Autonomous unmanned aerial vehicles work seamlessly within the GPS signal range, but their performance deteriorates in GPS-denied regions. This paper presents a unique collaborative computer vision-based approach for target tracking as per the image’s specific location of interest. The proposed method tracks any object without considering its properties like shape, color, size, or pattern. It is required to keep the target visible and line of sight during the tracking. The method gives freedom of selection to a user to track any target from the image and form a formation around it. We calculate the parameters like distance and angle from the image center to the object for the individual drones. Among all the drones, the one with a significant GPS signal strength or nearer to the target is chosen as the master drone to calculate the relative angle and distance between an object and other drones considering approximate Geo-location. Compared to actual measurements, the results of tests done on a quadrotor UAV frame achieve 99% location accuracy in a robust environment inside the exact GPS longitude and latitude block as GPS-only navigation methods. The individual drones communicate to the ground station through a telemetry link. The master drone calculates the parameters using data collected at ground stations. Various formation flying methods help escort other drones to meet the desired objective with a single high-resolution first-person view (FPV) camera. The proposed method is tested for Airborne Object Target Tracking (AOT) aerial vehicle model and achieves higher tracking accuracy.