It is a challenging task for self-driving vehicles in Real-World traffic scenarios to find a trade-off between the real-time performance and the high accuracy of the detection, recognition, and tracking in videos. This issue is addressed in this paper with an improved YOLOv3 (You Only Look Once) and a multi-object tracking algorithm (Deep-Sort). First, data augmentation is employed for small sample traffic signs to address the problem of an extremely unbalanced distribution of different samples in the dataset. Second, a new architecture of YOLOv3 is proposed to make it more suitable for detecting small targets. The detailed method is (1) removing the output feature map corresponding to the 32-times subsampling of the input image in the original YOLOv3 structure to reduce its computational costs and improve its real-time performances; (2) adding an output feature map of 4-times subsampling to improve its detection capability for the small traffic signs; (3) Deep-Sort is integrated into the detection method to improve the precision and robustness of multi-object detection, and the tracking ability in videos. Finally, our method demonstrated better detection capabilities, with respect to state-of-the-art approaches, which precision, recall and mAP is 91%, 90%, and 84.76% respectively.