Most video surveillance systems use both RGB and infrared cameras, making it a vital technique to re‐identify a person cross the RGB and infrared modalities. This task can be challenging due to both the cross‐modality variations caused by heterogeneous images in RGB and infrared, and the intra‐modality variations caused by the heterogeneous human poses, camera position, light brightness etc. To meet these challenges, a novel feature learning framework, hard pentaplet and identity loss network (HPILN), is proposed. In the framework existing single‐modality re‐identification models are modified to fit for the cross‐modality scenario, following which specifically designed hard pentaplet loss and identity loss are used to increase the accuracy of the modified cross‐modality re‐identification models. Based on the benchmark of the SYSU‐MM01 dataset, extensive experiments have been conducted, showing that the authors’ method outperforms all existing ones in terms of cumulative match characteristic curve and mean average precision.