The process of identifying specific examples of an item inside an image or video is referred to as “object detection,” and it is one of the computer vision techniques. When individuals look at pictures or movies, they are able to recognize and find things that are of interest to them. The primary goal of the field of object recognition is to achieve this level of intelligence via the use of a computer. Image segmentation and blob analysis, which uses simple object properties such as size, shape, or color, and Feature-based object detection, which uses feature extraction, matching, and RANSAC to estimate the location of an object, are two of the many common techniques that are used in conjunction with deep learning and machine learning-based object detection. There are also many other common techniques that are used in conjunction with deep learning and machine learning-based object detection. The most challenging aspects of object detection are the processes of categorizing things and pinpointing their locations. When seen from a different perspective, the same object might seem to be completely different. Detection algorithms have a tough time distinguishing between various things at different scales and perspectives since the sizes and proportions of items almost always alter. The camera input would be used to determine an object’s shape, according to this project’s proposal. Circle, square, and triangle are the three distinct shapes that we are determining an item to be right now. The finished result is shown in MATLAB’s graphical user interface (GUI). The productive identification process is carried out with the assistance of MATLAB. Using the SFTA Algorithm, the application extracts features from the specific image of a video. The neural network algorithm is used to classify the image of video. Image segmentation and image detection are two examples of related computer vision methods that are inextricably intertwined with object recognition. These techniques allow us to better comprehend and analyze the situations shown in movies and still photographs. It has been determined whether or not the suggested system is suitable. Because of this, the testing has shown that the suggested system generates generally accurate indications of real performance at a rate of 99.4%.