Traffic sign detection is one of the critical technologies in the field of intelligent transportation systems (ITS). The difficulty of traffic sign detection mainly lies in detecting small objects in a wide and complex traffic scene quickly and accurately. In this paper, we regard traffic sign detection as a region classification problem and propose a two-stage CNN-based approach to solve it. At the first stage, we design an efficient network which is built with improved fire-modules to generate object proposals quickly. The network up-samples and merges the feature maps of different scales to attain a high-resolution fused feature map which contains semantically strong features of multi-scale objects. Specially, the prediction is made on the fuse feature map and based on the novel center-point estimation. With the overall designs, our region proposal network can achieve high recall value while using low-resolution images. At the second stage, a separate classification network is proposed. The bottleneck of the classification performance is generally caused by the greatly similar appearances between traffic signs. Therefore, we further explore local regions with critical differences between traffic signs to obtain fine-grained local features which help to improve classification. Finally, we evaluate our method on a challenge benchmark Tsinghua-Tencent 100K which provides many large images with small traffic sign instances. The experiment result shows that our method has better performance and faster detection speed than many state-of-the-art traffic sign detection methods. INDEX TERMS Traffic sign detection, multi-scale, center-point estimation, local features.