This research describes "EspiNet", a Deep Learning Convolutional Neural Network model, in conjunction with a Markov Decision Process (MDP) tracker for detection and tracking of occluded motorcycles in urban environments. The model is trained and evaluated, using a new public dataset with up to 10,000 annotated images, created for this research, and captured in real urban traffic scenes. Images were captured using a moving camera mounted in a drone, where more than 60% of the motorcycles are affected by occlusions. The network design involves many tests, where a promising result of 88.84% in average precision (AP) is achieved, despite the considerable number of occluded vehicles, the movement of the camera and the low angle used for capture. The model predictions are used as input to an MDP tracker, reaching results up to 85.2% in Multiple Object Tracking Accuracy (MOTA). The proposed network architecture outperforms state of the art YOLO (You Look Only Once) v3.0 and Faster R-CNN (VGG16 based) detection models, producing also better tracking results in comparison with the use of the other two models as detector base for the MDP tracker.