“…Extracting meaningful information from the environment is a challenging task [40,39,51]. In recent years, deep neural networks are becoming more and more popular for knowledge discovering in many computer vision tasks, such as object recognition [44,50], object detection [24,19], visual question answering [45], pose estimateion [17], image synthesis [42,41,43], face recognition [7], and depth estimation [15]. Object detection is the task of recognizing and localizing the objects in the images with the deep model trained on labelled ground truth [25].…”