Computer vision has become an important technology for obtaining meaningful data from visual content and providing valuable information for enhancing security controls, marketing, and logistic strategies in diverse industrial and business sectors. The retail sector constitutes an important part of the worldwide economy. Analyzing customer data and shopping behaviors has become essential to deliver the right products to customers, maximize profits, and increase competitiveness. In-person shopping is still a predominant form of retail despite the appearance of online retail outlets. As such, in-person retail is adopting computer vision models to monitor store products and customers. This research paper presents the development of a computer vision solution by Lytica Company to detect customers in Steren’s physical retail stores in Mexico. Current computer vision models such as SSD Mobilenet V2, YOLO-FastestV2, YOLOv5, and YOLOXn were analyzed to find the most accurate system according to the conditions and characteristics of the available devices. Some of the challenges addressed during the analysis of videos were obstruction and proximity of the customers, lighting conditions, position and distance of the camera concerning the customer when entering the store, image quality, and scalability of the process. Models were evaluated with the F1-score metric: 0.64 with YOLO FastestV2, 0.74 with SSD Mobilenetv2, 0.86 with YOLOv5n, 0.86 with YOLOv5xs, and 0.74 with YOLOXn. Although YOLOv5 achieved the best performance, YOLOXn presented the best balance between performance and FPS (frames per second) rate, considering the limited hardware and computing power conditions.