The discipline of computer vision is becoming more popular as a research subject. In a surveillance-based computer vision application, item identification and tracking are the core procedures. They consist of segmenting and tracking an object of interest from a sequence of video frames, and they are both performed using computer vision algorithms. In situations when the camera is fixed and the backdrop remains constant, it is possible to detect items in the background using more straightforward methods. Aerial surveillance, on the other hand, is characterized by the fact that the target, as well as the background and video camera, are all constantly moving. It is feasible to recognize targets in the video data captured by an unmanned aerial vehicle (UAV) using the mean shift tracking technique in combination with a deep convolutional neural network (DCNN). It is critical that the target detection algorithm maintains its accuracy even in the presence of changing lighting conditions, dynamic clutter, and changes in the scene environment. Even though there are several approaches for identifying moving objects in the video, background reduction is the one that is most often used. An adaptive background model is used to create a mean shift tracking technique, which is shown and implemented in this work. In this situation, the background model is provided and updated frame-by-frame, and therefore, the problem of occlusion is fully eliminated from the equation. The target tracking algorithm is fed the same video stream that was used for the target identification algorithm to work with. In MATLAB, the works are simulated, and their performance is evaluated using image-based and video-based metrics to establish how well they operate in the real world.