Object detection has experienced a surge in interest due to its relevance in video analysis and image interpretation. Traditional object detection approaches relied on handcrafted features and shallow trainable algorithms, which limited their performance. However, the advancement of Deep learning (DL) has provided more powerful tools that can extract semantic, highlevel, and deep features, addressing the shortcomings of previous systems. Deep Learning-based object detection models differ regarding network architecture, training techniques, and optimization functions. In this study, common generic designs for object detection and various modifications and tips to enhance detection performance have been investigated. Furthermore, future directions in object detection research, including advancements in Neural Network-based learning systems and the challenges have been discussed. In addition, comparative analysis based on performance parameters of various versions of YOLO approach for multiple object detection has been presented.