Multi-class object detection has a rapid evolution in the last few years with the rise of deep Convolutional Neural Networks (CNNs) learning based, in particular. However, the success approaches are based on high resolution ground level images and extremely large volume of data as in COCO and VOC datasets. On the other hand, the availability of the drones has been increased in the last few years and hence several new applications have been established. One of such is understanding drone footage by analysing, detecting, recognizing different objects in the covered area. In this study conducted, a collection of large images captured by a drone flying at a fixed altitude in a desert area located within the United Arab Emirates (UAE) is given and it is utilised for training and evaluating the CNN networks to be investigated. Three state-of-the-art CNN architectures, namely SSD-500 with VGGNet-16 meta-architecture, SSD-500 with ResNet meta-architecture and YOLO-V3 with Darknet-53 are optimally configured, re-trained, tested and evaluated for the detection of three different classes of objects in the captured footage, namely, palm trees, group-of-animals/cattle and animal sheds in farms. Our preliminary experiments revealed that YOLO-V3 outperformed SSD-500 with VGGNet-16 by a large margin and has a considerable improvement as compared to using SSD-500 with ResNet. Therefore, it has been selected for further investigation, aiming to propose an efficient coarse-to-fine object detection model for multi-class object detection in drone images. To this end, the impact of changing the activation function of the hidden units and the pooling type in the pooling layer has been investigated in detail. In addition, the impact of tuning the learning rate and the selection of the most effective optimization method for general hyper-parameters tuning is also investigated. The result demonstrated that the multi-class object detector developed has precision of 0.99, a recall of 0.94 and an F-score of 0.96, proving the efficiency of the multi-class object detection network developed.