Visible–infrared person re-identification (VI-ReID) is an emerging technology for realizing all-weather smart surveillance systems. To address the problem of pedestrian discriminative information being difficult to obtain and easy to lose, as well as the wide modality difference in the VI-ReID task, in this paper we propose a two-stage attribute embedding and modality consistency learning-based VI-ReID method. First, the attribute information embedding module introduces the fine-grained pedestrian information in the attribute label into the transformer backbone, enabling the backbone to extract identity-discriminative pedestrian features. After obtaining the pedestrian features, the attribute embedding enhancement module is utilized to realize the second-stage attribute information embedding, which reduces the adverse effect of losing the person discriminative information due to the deepening of network. Finally, the modality consistency learning loss is designed for constraining the network to mine the consistency information between two modalities in order to reduce the impact of modality difference on the recognition results. The results show that our method reaches 74.57% mAP on the SYSU-MM01 dataset in All Search mode and 87.02% mAP on the RegDB dataset in IR-to-VIS mode, with a performance improvement of 6.00% and 2.56%, respectively, proving that our proposed method is able to reach optimal performance compared to existing state-of-the-art methods.