Intelligent transportation systems (ITSs) play an increasingly important role in traffic management and traffic safety. Smart cameras are the most widely used sensors in ITSs. However, cameras suffer from a reduction in detection and positioning accuracy due to target occlusion and external environmental interference, which has become a bottleneck restricting ITS development. This work designs a stable perception system based on a millimeter-wave radar and camera to address these problems. Radar has better ranging accuracy and weather robustness, which is a better complement to camera perception. Based on an improved Gaussian mixture probability hypothesis density (GM-PHD) filter, we also propose an optimal attribute fusion algorithm for target detection and tracking. The algorithm selects the sensors’ optimal measurement attributes to improve the localization accuracy while introducing an adaptive attenuation function and loss tags to ensure the continuity of the target trajectory. The verification experiments of the algorithm and the perception system demonstrate that our scheme can steadily output the classification and high-precision localization information of the target. The proposed framework could guide the design of safer and more efficient ITSs with low costs.