Oblique photogrammetry-based three-dimensional (3D) urban models are widely used for smart cities. In 3D urban models, road signs are small but provide valuable information for navigation. However, due to the problems of sliced shape features, blurred texture and high incline angles, road signs cannot be fully reconstructed in oblique photogrammetry, even with state-of-the-art algorithms. The poor reconstruction of road signs commonly leads to less informative guidance and unsatisfactory visual appearance. In this paper, we present a pipeline for embedding road sign models based on deep convolutional neural networks (CNNs). First, we present an end-to-end balanced-learning framework for small object detection that takes advantage of the region-based CNN and a data synthesis strategy. Second, under the geometric constraints placed by the bounding boxes, we use the scale-invariant feature transform (SIFT) to extract the corresponding points on the road signs. Third, we obtain the coarse location of a single road sign by triangulating the corresponding points and refine the location via outlier removal. Least-squares fitting is then applied to the refined point cloud to fit a plane for orientation prediction. Finally, we replace the road signs with computer-aided design models in the 3D urban scene with the predicted location and orientation. The experimental results show that the proposed method achieves a high mAP in road sign detection and produces visually plausible embedded results, which demonstrates its effectiveness for road sign modeling in oblique photogrammetry-based 3D scene reconstruction.