Person search is a widely-concerned computer vision task that aims to jointly solve the problems of pedestrian detection and person re-identification in panoramic scenes. However, the pedestrian detection focuses on the consistency of pedestrians, while the person re-identification attempts to extract the discriminative features of pedestrians. The inevitable conflict greatly restricts the researches on the one-stage person search methods. To address this issue, we propose a Vision Shared and Representation Isolated (VSRI) network to decouple the two conflicted subtasks simultaneously, through which two independent representations are constructed for the two subtasks. To enhance the discrimination of the re-ID representation, a Multi-Level Feature Fusion (MLFF) module is proposed. The MLFF adopts the Spatial Pyramid Feature Fusion (SPFF) module to obtain diverse features from the stem network. Moreover, the multi-head self-attention mechanism is employed to construct a Multi-head Attention Driven Extraction (MADE) module and the cascaded convolution unit is adopted to devise a Feature Decomposition and Cascaded Integration (FDCI) module, which facilitates the MLFF to obtain more discriminative representations of the pedestrians. The proposed method outperforms the state-of-the-art methods on the mainstream datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.