Glaucoma is an eye disease that gradually deteriorates vision. Much research focuses on extracting information from the optic disc and optic cup, the structure used for measuring the cup-to-disc ratio. These structures are commonly segmented with deeplearning techniques, primarily using Encoder–Decoder models, which are hard to train and time-consuming. Object detection models using convolutional neural networks can extract features from fundus retinal images with good precision. However, the superiority of one model over another for a specific task is still being determined. The main goal of our approach is to compare object detection model performance to automate segment cups and discs on fundus images. This study brings the novelty of seeing the behavior of different object detection models in the detection and segmentation of the disc and the optical cup (Mask R-CNN, MS R-CNN, CARAFE, Cascade Mask R-CNN, GCNet, SOLO, Point_Rend), evaluated on Retinal Fundus Images for Glaucoma Analysis (REFUGE), and G1020 datasets. Reported metrics were Average Precision (AP), F1-score, IoU, and AUCPR. Several models achieved the highest AP with a perfect 1.000 when the threshold for IoU was set up at 0.50 on REFUGE, and the lowest was Cascade Mask R-CNN with an AP of 0.997. On the G1020 dataset, the best model was Point_Rend with an AP of 0.956, and the worst was SOLO with 0.906. It was concluded that the methods reviewed achieved excellent performance with high precision and recall values, showing efficiency and effectiveness. The problem of how many images are needed was addressed with an initial value of 100, with excellent results. Data augmentation, multi-scale handling, and anchor box size brought improvements. The capability to translate knowledge from one database to another shows promising results too.