Coffee farmers do not have efficient tools to have sufficient and reliable information on the maturation stage of coffee fruits before harvest. In this study, we propose a computer vision system to detect and classify the Coffea arabica (L.) on tree branches in three classes: unripe (green), ripe (cherry), and overripe (dry). Based on deep learning algorithms, the computer vision model YOLO (You Only Look Once), was trained on 387 images taken from coffee branches using a smartphone. The YOLOv3 and YOLOv4, and their smaller versions (tiny), were assessed for fruit detection. The YOLOv4 and YOLOv4-tiny showed better performance when compared to YOLOv3, especially when smaller network sizes are considered. The mean average precision (mAP) for a network size of 800 × 800 pixels was equal to 81 %, 79 %, 78 %, and 77 % for YOLOv4, YOLOv4-tiny, YOLOv3, and YOLOv3-tiny, respectively. Despite the similar performance, the YOLOv4 feature extractor was more robust when images had greater object densities and for the detection of unripe fruits, which are generally more difficult to detect due to the color similarity to leaves in the background, partial occlusion by leaves and fruits, and lighting effects. This study shows the potential of computer vision systems based on deep learning to guide the decision-making of coffee farmers in more objective ways.