Bag of Tricks and a Strong Baseline for Deep Person Re-Identification

Luo, Hao; Gu, Youzhi; Liao, Xingyu; Lai, Shupeng; Jiang, Wei

doi:10.1109/cvprw.2019.00190

Cited by 1,088 publications

(466 citation statements)

References 23 publications

Supporting

Mentioning

462

Contrasting

Unclassified

Order By: Relevance

“…The network is updated for 100 epochs by the stochastic gradient descent algorithm with a weight decay of 5×10 −4 . Following [37], the warmup learning rate adjustment strategy is applied to bootstrap the network for better performance. The learning rate linearly increases from 0.06 to 0.6 in the first 10 epochs.…”

Section: B Implementation Detailsmentioning

confidence: 99%

End-to-End Comparative Attention Networks for Person Re-Identification

Líu¹,

Feng²,

Qi³

et al. 2017

IEEE Trans. on Image Process.

562

317

View full text Add to dashboard Cite

Abstract-Person re-identification across disjoint camera views has been widely applied in video surveillance yet it is still a challenging problem. One of the major challenges lies in the lack of spatial and temporal cues, which makes it difficult to deal with large variations of lighting conditions, viewing angles, body poses and occlusions. Recently, several deep learning based person re-identification approaches have been proposed and achieved remarkable performance. However, most of those approaches extract discriminative features from the whole frame at one glimpse without differentiating various parts of the persons to identify. It is essentially important to examine multiple highly discriminative local regions of the person images in details through multiple glimpses for dealing with the large appearance variance.In this paper, we propose a new soft attention based model, i.e., the end-to-end Comparative Attention Network (CAN), specifically tailored for the task of person re-identification. The end-to-end CAN learns to selectively focus on parts of pairs of person images after taking a few glimpses of them and adaptively comparing their appearance. The CAN model is able to learn which parts of images are relevant for discerning persons and automatically integrates information from different parts to determine whether a pair of images belongs to the same person. In other words, our proposed CAN model simulates the human perception process to verify whether two images are from the same person. Extensive experiments on four benchmark person re-identification datasets, including CUHK01, CHUHK03, Market-1501 and VIPeR, clearly demonstrate that our proposed end-to-end CAN for person re-identification outperforms well established baselines significantly and offer new state-of-the-art performance.

show abstract

Section: B Implementation Detailsmentioning

confidence: 99%

End-to-End Comparative Attention Networks for Person Re-Identification

Líu¹,

Feng²,

Qi³

et al. 2017

IEEE Trans. on Image Process.

562

317

View full text Add to dashboard Cite

show abstract

“…Even though the current top approach achieves 88.2% mAP on the Market dataset, we still outperform many recent methods by a large margin. Current top performing methods typically use a complex architecture [39,38,46] or tricks such as larger input images and more elaborate augmentations [20]. Our single-task baseline is essentially a simplified TriNet architecture [8], nevertheless, it still significantly improves the original mAP score of 69.14% by over 8%, yielding a solid baseline performance for person ReID.…”

Section: Quantitative Resultsmentioning

confidence: 99%

Visual Person Understanding Through Multi-task and Multi-dataset Learning

Pfeiffer

Hermans

Sárándi

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

We address the problem of learning a single model for person re-identification, attribute classification, body part segmentation, and pose estimation. With predictions for these tasks we gain a more holistic understanding of persons, which is valuable for many applications. This is a classical multi-task learning problem. However, no dataset exists that these tasks could be jointly learned from. Hence several datasets need to be combined during training, which in other contexts has often led to reduced performance in the past. We extensively evaluate how the different task and datasets influence each other and how different degrees of parameter sharing between the tasks affect performance. Our final model matches or outperforms its single-task counterparts without creating significant computational overhead, rendering it highly interesting for resource-constrained scenarios such as mobile robotics.

show abstract

“…where BN (·) is the BNNeck introduced in [52], [·] means concatenation. The total loss is the summation of the four losses:…”

Section: Loss Functionsmentioning

confidence: 99%

Few-Shot Deep Adversarial Learning for Video-Based Person Re-Identification

Wang

Yin

et al. 2020

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Recent years have witnessed a great development of deep learning based video person re-identification (Re-ID).A key factor for video person Re-ID is how to effectively construct discriminative video feature representations for the robustness to many complicated situations like occlusions. Recent part-based approaches employ spatial and temporal attention to extract the representative local features. While the correlations between the parts are ignored in the previous methods, to leverage the relations of different parts, we propose an innovative adaptive graph representation learning scheme for video person Re-ID, which enables the contextual interactions between the relevant regional features. Specifically, we exploit pose alignment connection and feature affinity connection to construct an adaptive structure-aware adjacency graph, which models the intrinsic relations between graph nodes. We perform feature propagation on the adjacency graph to refine the original regional features iteratively, the neighbor nodes information is taken into account for part feature representation. To learn the compact and discriminative representations, we further propose a novel temporal resolution-aware regularization, which enforces the consistency among different temporal resolutions for the same identities. We conduct extensive evaluations on four benchmarks, i.e. iLIDS-VID, PRID2011, MARS, and DukeMTMC-VideoReID, the experimental results achieve the competitive performance which demonstrates the effectiveness of our proposed method.

show abstract

Bag of Tricks and a Strong Baseline for Deep Person Re-Identification

Cited by 1,088 publications

References 23 publications

End-to-End Comparative Attention Networks for Person Re-Identification

End-to-End Comparative Attention Networks for Person Re-Identification

Visual Person Understanding Through Multi-task and Multi-dataset Learning

Few-Shot Deep Adversarial Learning for Video-Based Person Re-Identification

Contact Info

Product

Resources

About