Attention mechanisms have been found effective for person re-identification (Re-ID). However, the learned "attentive" features are often not naturally uncorrelated or "diverse", which compromises the retrieval performance based on the Euclidean distance. We advocate the complementary powers of attention and diversity for Re-ID, by proposing an Attentive but Diverse Network (ABD-Net). ABD-Net seamlessly integrates attention modules and diversity regularizations throughout the entire network to learn features that are representative, robust, and more discriminative. Specifically, we introduce a pair of complementary attention modules, focusing on channel aggregation and position awareness, respectively. Then, we plug in a novel orthogonality constraint that efficiently enforces diversity on both hidden activations and weights. Through an extensive set of ablation study, we verify that the attentive and diverse terms each contributes to the performance boosts of ABD-Net. It consistently outperforms existing state-of-the-art methods on there popular person Re-ID benchmarks. * This also exploits attention mechanisms.• This is with a ResNet-152 backbone. This is with a DenseNet-121 backbone. ‡ Official codes are not released. We report the numbers in the original paper, which are better than our re-implementation.comes 3.40% for top-1 and 6.40% for mAP. We also considered SVDNet [13] and HA- CNN [50] which also proposed to generate diverse and uncorrelated feature embeddings. ABD-Net surpasses both with significant top-1 and mAP improvement. Overall, our observations endorse the superiority of ABD-Net by combing "attentive" and "diverse".
VisualizationsAttention Pattern Visualization: We conduct a set of attention visualizations * * on the final output feature maps of the baseline (XE), baseline (XE) + PAM + CAM, and ABD-Net (XE), as shown in Fig.5. We notice that the feature maps from the baseline show little attentiveness. PAM + * * Grad-CAM visualization method [73]: https://github.com/ utkuozbulak/pytorch-cnn-visualizations; RAM visualization method [74] for testing images. More results can be found in the supplementary.