Person Re-identification (re-id) faces two major challenges: the lack of cross-view paired training data and learning discriminative identity-sensitive and viewinvariant features in the presence of large pose variations. In this work, we address both problems by proposing a novel deep person image generation model for synthesizing realistic person images conditional on pose. The model is based on a generative adversarial network (GAN) designed specifically for pose normalization in re-id, thus termed pose-normalization GAN (PN-GAN). With the synthesized images, we can learn a new type of deep re-id feature free of the influence of pose variations. We show that this feature is strong on its own and complementary to features learned with the original images. Importantly, under the transfer learning setting, we show that our model generalizes well to any new re-id dataset without the need for collecting any training data for model fine-tuning. The model thus has the potential to make re-id model truly scalable.
Person Re-identification (re-id) aims to match people across non-overlapping camera views in a public space. It is a challenging problem because many people captured in surveillance videos wear similar clothes. Consequently, the differences in their appearance are often subtle and only detectable at the right location and scales. Existing re-id models, particularly the recently proposed deep learning based ones match people at a single scale. In contrast, in this paper, a novel multi-scale deep learning model is proposed. Our model is able to learn deep discriminative feature representations at different scales and automatically determine the most suitable scales for matching. The importance of different spatial locations for extracting discriminative features is also learned explicitly. Experiments are carried out to demonstrate that the proposed model outperforms the state-of-the art on a number of benchmarks.
Person re-identification (re-id) aims to match people across non-overlapping camera views in a public space. This is a challenging problem because the people captured in surveillance videos often wear similar clothing. Consequently, the differences in their appearance are typically subtle and only detectable at particular locations and scales. In this paper, we propose a deep re-id network (MuDeep) that is composed of two novel types of layers-a multi-scale deep learning layer, and a leader-based attention learning layer. Specifically, the former learns deep discriminative feature representations at different scales, while the latter utilizes the information from multiple scales to lead and determine the optimal weightings for each scale. The importance of different spatial locations for extracting discriminative features is learned explicitly via our leader-based attention learning layer. Extensive experiments are carried out to demonstrate that the proposed MuDeep outperforms the state-of-the-art on a number of benchmarks and has a better generalization ability under a domain generalization setting.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.