This study proposes a vision‐based method for traffic sign attribute estimation, i.e. 3D position and pose, from image sequences by binocular or monocular cameras. The method starts with acquiring robust feature correspondences based on homography constraints from image pairs. Then the objective function is designed to integrate the feature correspondences to optimise the parameters of the traffic sign plane in the 3D coordinate. Finally, the sign plane is utilised for attribute estimation. In addition, the authors provide an extension for the raw KITTI dataset, which can be utilised for 3D tasks of traffic sign localisation and pose estimation. In the experiments, three popular methods are employed for comparisons based on the publicly available BelgiumTS and KITTI datasets. The results show that the authors’ method based on SIFT and SURF features can locate the traffic signs with a mean error of ∼0.44 and 0.51 m in the BelgiumTS and KITTI datasets, respectively, and estimate the pose with a mean error of ∼14.45° in the KITTI dataset.