Colorectal cancer is a major health problem, where advances towards computer-aided diagnosis (CAD) systems to assist the endoscopist can be a promising path to improvement. Here, a deep learning model for real-time polyp detection based on a pre-trained YOLOv3 (You Only Look Once) architecture and complemented with a post-processing step based on an object-tracking algorithm to reduce false positives is reported. The base YOLOv3 network was fine-tuned using a dataset composed of 28,576 images labelled with locations of 941 polyps that will be made public soon. In a frame-based evaluation using isolated images containing polyps, a general F1 score of 0.88 was achieved (recall = 0.87, precision = 0.89), with lower predictive performance in flat polyps, but higher for sessile, and pedunculated morphologies, as well as with the usage of narrow band imaging, whereas polyp size < 5 mm does not seem to have significant impact. In a polyp-based evaluation using polyp and normal mucosa videos, with a positive criterion defined as the presence of at least one 50-frames-length (window size) segment with a ratio of 75% of frames with predicted bounding boxes (frames positivity), 72.61% of sensitivity (95% CI 68.99–75.95) and 83.04% of specificity (95% CI 76.70–87.92) were achieved (Youden = 0.55, diagnostic odds ratio (DOR) = 12.98). When the positive criterion is less stringent (window size = 25, frames positivity = 50%), sensitivity reaches around 90% (sensitivity = 89.91%, 95% CI 87.20–91.94; specificity = 54.97%, 95% CI 47.49–62.24; Youden = 0.45; DOR = 10.76). The object-tracking algorithm has demonstrated a significant improvement in specificity whereas maintaining sensitivity, as well as a marginal impact on computational performance. These results suggest that the model could be effectively integrated into a CAD system.