HSME: Hypersphere Manifold Embedding for Visible Thermal Person Re-Identification

Yi, Hao; Wang, Nannan; Li, Jie; Gao, Xinbo

doi:10.1609/aaai.v33i01.33018385

Cited by 238 publications

(108 citation statements)

References 18 publications

Supporting

Mentioning

108

Contrasting

Order By: Relevance

“…For performance measure, the rank‐1, ‐10, ‐20 accuracies of CMC and mAP are used to show the clear performance superiority of our HPILN method. The comparison takes advantage of seven state‐of‐the‐art methods: zero‐padding [7], cmGAN [9], bi‐directional dual‐constrained top‐ranking (BDTR) [8], inter‐channel pair between the visible‐light and thermal images + multi‐scale Retinex (IPVT‐1 + MSR) [10], D 2 RL [11], bi‐directional center‐constrained top‐ranking (eBDTR) [27] and D‐hypersphere manifold embedding (HSME) [28].…”

Section: Resultsmentioning

confidence: 99%

“…In addition, other existing methods are used for comparison, including handcrafted features such as Histograms of Oriented Gradient (HOG) [29] and Local Maximal Occurrence (LOMO) [30], cross‐domain models such as Common Discriminant Feature Extraction (CDFE) [31] and Camera coRrelation Aware Feature augmenTation (CRAFT) [32], canonical correlation analysis (CCA) [33], one‐stream and two‐stream networks [7], and metric learning method Local Fisher Discriminant Analysis (LFDA) [34]. Most of the results were obtained from the references [7–11, 27, 28].…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

HPILN: a feature learning framework for cross‐modality person re‐identification

et al. 2019

View full text Add to dashboard Cite

Most video surveillance systems use both RGB and infrared cameras, making it a vital technique to re‐identify a person cross the RGB and infrared modalities. This task can be challenging due to both the cross‐modality variations caused by heterogeneous images in RGB and infrared, and the intra‐modality variations caused by the heterogeneous human poses, camera position, light brightness etc. To meet these challenges, a novel feature learning framework, hard pentaplet and identity loss network (HPILN), is proposed. In the framework existing single‐modality re‐identification models are modified to fit for the cross‐modality scenario, following which specifically designed hard pentaplet loss and identity loss are used to increase the accuracy of the modified cross‐modality re‐identification models. Based on the benchmark of the SYSU‐MM01 dataset, extensive experiments have been conducted, showing that the authors’ method outperforms all existing ones in terms of cumulative match characteristic curve and mean average precision.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

HPILN: a feature learning framework for cross‐modality person re‐identification

et al. 2019

View full text Add to dashboard Cite

show abstract

“…We evaluate the proposed method (CoAL) on the SYSU-MM01 [50] and RegDB [32] datasets, and compare with state-of-the-art methods, including Zero-Pad [50], HCML [54], BDTR [55], cmGAN [8], MAC [53], D 2 RL [44], D-HSME [15], AlignGAN [41], MSR [12], CMSP [49], X-Modal [21], and Hi-CMD [7]. As it can be seen from the results presented in Table 1 and Table 2, our proposed method outperforms state-of-the-art methods significantly on both two datasets.…”

Section: Comparison With State-of-the-art Methodsmentioning

confidence: 99%

“…Wang et al [43] performed a comprehensive survey of heterogeneous person re-identification. Specifically, several types of cross-modality person ReID have been studied, including Image-to-Text cross-modality retrieval [24], Photo-to-Sketch cross-modality retrieval [34], and popular Infrared-to-Visible cross-modality retrieval [8,15,21,41,44,50,54,55,58]. Li et al [24] proposed that searching a person with free-form natural language descriptions can be widely applied in video surveillance and build a dataset for image-text cross-modality retrieval.…”

Section: Cross-modality Retrievalmentioning

confidence: 99%

“…Ye et al [55] proposed a dual-constrained pair-wise triplet loss to directly optimize cross-modal matching. Hao et al [15] proposed a unified cross-modal feature learning and metric learning approach in a hyper-sphere manifold space. Recently, a few of methods based on generative adversarial networks (GANs) [13] have been proposed to narrow the semantic gap between the infrared and visible modalities.…”

Section: Cross-modality Retrievalmentioning

confidence: 99%

See 1 more Smart Citation

Co-Attentive Lifting for Infrared-Visible Person Re-Identification

Wei

Hong

et al. 2020

Proceedings of the 28th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Infrared-visible cross-modality person re-identification (IV-ReID) has attracted much attention with the popularity of dual-mode video surveillance systems, where the RGB mode works in the daytime and automatically switches to the infrared mode at night. Despite its significant application value, IV-ReID remains a difficult problem mainly due to two great challenges. First, it is difficult to identify persons in the infrared image, which lacks color and texture clues. Second, there is a significant gap between the infrared and visible modalities where appearances of the same person vary considerably. This paper proposes a novel attention-based approach to handle the two difficulties in a unified framework. 1) We propose an attention lifting mechanism to learn discriminative features in each modality. 2) We propose a co-attentive learning mechanism to bridge the gap between the two modalities. Our method only makes slight modifications of a given backbone network and requires small computation overhead while improving the performance significantly. We conduct extensive experiments to demonstrate the superiority of our proposed method. CCS CONCEPTS • Computing methodologies → Visual content-based indexing and retrieval; Appearance and texture representations; • Information systems → Nearest-neighbor search.

show abstract