The agricultural industry plays a vital role in the global demand for food production. Along with population growth, there is an increasing need for efficient farming practices that can maximize crop yields. Conventional methods of harvesting lettuce often rely on manual labor, which can be time-consuming, labor-intensive, and prone to human error. These challenges lead to research into automation technology, such as robotics, to improve harvest efficiency and reduce reliance on human intervention. Deep learning-based object detection models have shown impressive success in various computer vision tasks, such as object recognition. RetinaNet model can be trained to identify and localize lettuce accurately. However, the pre-trained models must be fine-tuned to adapt to the specific characteristics of lettuce, such as shape, size, and occlusion, to deploy object recognition models in real-world agricultural scenarios. Fine-tuning the models using lettuce-specific datasets can improve their accuracy and robustness for detecting and localizing lettuce. The data acquired for RetinaNet has the highest accuracy of 0.782, recall of 0.844, f1-score of 0.875, and mAP of 0,962. Metrics evaluate that the higher the score, the better the model performs.