This paper presents “EspiNet V2” a Deep Learning model, based on the region-based detector Faster R-CNN. The model is used for the detection of motorcycles in urban environments, where occlusion is likely. For training, two datasets are used: the Urban Motorbike Dataset (UMD-10K) of 10,000 annotated images, and the new SMMD (Secretaría de Movilidad Motorbike Dataset), of 5,000 images captured from the Traffic Control CCTV System in Medellín (Colombia). Results achieved on the UMD-10K dataset reach 88.8% in average precision (AP) even when 60% motorcycles were occluded, and the images were captured from a low angle and a moving camera. Meanwhile, an AP of 79.5% is reached for SSMD. EspiNet V2 outperforms popular models such as YOLO V3 and Faster R-CNN (VGG16 based) trained end-to-end for those datasets