A number of intelligent applications demand multiple item detection and tracking as vital elements. Although object search links the identified object through a series of frames, object detection pinpoints the thing's location in a scene. Over the last few decades, a wide range of approaches has been created, which may be divided into 2D and stereo-based 3D techniques. When used in limited circumstances, most of these strategies yield trustworthy findings. These limiting presumptions are used to determine the number of complex elements that object detection and tracking naturally entail. Environmental factors, object appearance, flow density, backdrop colour intensity information, the amount of time an object is present in the scene, object occlusion, a scene's maximum number of things, etc., are among the most often held presumptions. In real-time applications, these approaches' dependability is not assured. A modern surveillance system needs reliable object identification and tracking in an open area.