Learning to disentangle scenes for person re-identification

Zang, Xianghao; Li, Ge; Gao, Wei; Shu, Xiujun

doi:10.1016/j.imavis.2021.104330

Cited by 39 publications

(3 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ye et al [31] considered both intermodality and intra-modality changes, and designed a high-order loss constraint based on bidirectional constraints to constrain pedestrian features on the basis of a two-path network structure. In order to reduce the burden of the network, Zang et al [32] proposed a general multipartite network, in which these branches cooperate in learning to deal with different scenarios. Zhu et al [6] proposed that the loss of the heterogeneous center can reduce intra-class transmorphological changes.…”

Section: Related Workmentioning

confidence: 99%

Learning Visible Thermal Person Re-Identification via Spatial Dependence and Dual-Constraint Loss

Wang

Zhang

Feng

et al. 2022

Entropy

View full text Add to dashboard Cite

Visible thermal person re-identification (VT Re-ID) is the task of matching pedestrian images collected by thermal and visible light cameras. The two main challenges presented by VT Re-ID are the intra-class variation between pedestrian images and the cross-modality difference between visible and thermal images. Existing works have principally focused on local representation through cross-modality feature distribution, but ignore the internal connection of the local features of pedestrian body parts. Therefore, this paper proposes a dual-path attention network model to establish the spatial dependency relationship between the local features of the pedestrian feature map and to effectively enhance the feature extraction. Meanwhile, we propose cross-modality dual-constraint loss, which adds the center and boundary constraints for each class distribution in the embedding space to promote compactness within the class and enhance the separability between classes. Our experimental results show that our proposed approach has advantages over the state-of-the-art methods on the two public datasets SYSU-MM01 and RegDB. The result for the SYSU-MM01 is Rank-1/mAP 57.74%/54.35%, and the result for the RegDB is Rank-1/mAP 76.07%/69.43%.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning Visible Thermal Person Re-Identification via Spatial Dependence and Dual-Constraint Loss

Wang

Zhang

Feng

et al. 2022

Entropy

View full text Add to dashboard Cite

show abstract

“…Zhao et al [11] proposed a multi-branch re-identification network based on saliency-guided asymmetric mutual hashing, using saliency maps generated by teacher networks to guide student networks to learn high-quality hash codes. Zang et al [13] utilized scene discriminative features to enhance the representation ability of multi-branch re-identification networks.…”

Section: Introductionmentioning

confidence: 99%

Multi-branch mutual learning net for vehicle re-identification

2023

AJCIS

View full text Add to dashboard Cite

To address the problem that the global feature extraction ability of single-branch network is low and does not have the sensitivity of specific scenes due to the interference of factors such as occlusion caused by fixed camera shooting and scale scaling caused by vehicle driving from far and near in vehicle re-identification, ML Net (Mutual Learning Net) is proposed. Firstly, the multi-branch structure of the network consists of one master branch and three slave branches, the master branch is responsible for the feature learning task in general complex scenes, and the three slave branches are responsible for the feature learning task under image occlusion, scale scaling, and vehicle color change, so that the network can be sensitive to specific scenes. Finally, the models are trained in each branching stage jointly with Label Smoothing Cross Entropy Loss, Triplet Loss and KL (Kullback-Leibler Divergence) Loss. The experimental results show that the proposed ML Net network achieves advanced experimental results on two publicly available datasets, VeRi776 and VehicleID.

show abstract

“…P EDESTRIAN retrieval is a critical task in intelligent surveillance [1] [2] [3]. Given a pedestrian image as the query, pedestrian retrieval aims to find the right images in a large gallery.…”

Section: Introductionmentioning

confidence: 99%

Multi-direction and Multi-scale Pyramid in Transformer for Video-based Pedestrian Retrieval

Zang,

Li,

Gao

2022

Preprint

Self Cite

View full text Add to dashboard Cite

In video surveillance, pedestrian retrieval (also called person re-identification) is a critical task. This task aims to retrieve the pedestrian of interest from non-overlapping cameras. Recently, transformer-based models have achieved significant progress for this task. However, these models still suffer from ignoring fine-grained, part-informed information. This paper proposes a multi-direction and multi-scale Pyramid in Transformer (PiT) to solve this problem. In transformerbased architecture, each pedestrian image is split into many patches. Then, these patches are fed to transformer layers to obtain the feature representation of this image. To explore the fine-grained information, this paper proposes to apply vertical division and horizontal division on these patches to generate different-direction human parts. These parts provide more finegrained information. To fuse multi-scale feature representation, this paper presents a pyramid structure containing global-level information and many pieces of local-level information from different scales. The feature pyramids of all the pedestrian images from the same video are fused to form the final multi-direction and multi-scale feature representation. Experimental results on two challenging video-based benchmarks, MARS and iLIDS-VID, show the proposed PiT achieves state-of-the-art performance. Extensive ablation studies demonstrate the superiority of the proposed pyramid structure. The code is available at https://git.openi.org.cn/zangxh/PiT.git.Index Terms-video-based pedestrian retrieval, vision transformer, multi-direction and multi-scale pyramid.

show abstract

Learning to disentangle scenes for person re-identification

Cited by 39 publications

References 4 publications

Learning Visible Thermal Person Re-Identification via Spatial Dependence and Dual-Constraint Loss

Learning Visible Thermal Person Re-Identification via Spatial Dependence and Dual-Constraint Loss

Multi-branch mutual learning net for vehicle re-identification

Multi-direction and Multi-scale Pyramid in Transformer for Video-based Pedestrian Retrieval

Contact Info

Product

Resources

About