Insect pests are a major element influencing agricultural production. According to the Food and Agriculture Organization (FAO), an estimated 20–40% of pest damage occurs each year, which reduces global production and becomes a major challenge to crop production. These insect pests cause sooty mold disease by sucking the sap from the crop’s organs, especially leaves, fruits, stems, and roots. To control these pests, pesticides are frequently used because they are fast-acting and scalable. Due to environmental pollution and health awareness, less use of pesticides is recommended. One of the salient approaches could be to reduce the wide use of pesticides by spraying on demand. To perform spot spraying, the location of the pest must first be determined. Therefore, the growing population and increasing food demand emphasize the development of novel methods and systems for agricultural production to address environmental concerns and ensure efficiency and sustainability. To accurately identify these insect pests at an early stage, insect pest detection and classification have recently become in high demand. Thus, this study aims to develop an object recognition system for the detection of crops damaging insect pests and their classification. The current work proposes an automatic system in the form of a smartphone IP- camera to detect insect pests from digital images/videos to reduce farmers’ reliance on pesticides. The proposed approach is based on YOLO object detection architectures including YOLOv5 (n, s, m, l, and x), YOLOv3, YOLO-Lite, and YOLOR. For this purpose, we collected 7046 images in the wild under different illumination and background conditions to train the underlying object detection approaches. We trained and test the object recognition system with different parameters from scratch. The eight models are compared and analyzed. The experimental results show that the average precision (AP@0.5) of the eight models including YOLO-Lite, YOLOv3, YOLOR, and YOLOv5 with five different scales (n, s, m, l, and x) reach 51.7%, 97.6%, 96.80%, 83.85%, 94.61%, 97.18%, 97.04%, and 98.3% respectively. The larger the model, the higher the average accuracy of the detection validation results. We observed that the YOLOv5x model is fully functional and can correctly identify the twenty-three species of insect pests at 40.5 milliseconds (ms). The developed model YOLOv5x performs the state-of-the-art model with an average precision value of (mAP@0.5) 98.3%, (mAP@0.5:0.95) value of 79.8%, precision of 94.5% and a recall of 97.8%, and F1-score with 96% on our IP-23 dataset. The results show that the system works efficiently and was able to correctly detect and identify insect pests, which can be employed for realistic application while farming.