Spatial and Temporal Mutual Promotion for Video-based Person Re-identification

Liu, Yiheng; Yuan, Zhenxun; Zhou, Wengang; Li, Houqiang

doi:10.48550/arxiv.1812.10305

Cited by 1 publication

(4 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Learning a discriminative clip-level feature is crucial to video-based person re-id. Most previous work dedicated to aggregating frame features vectors across temporal dimension into a clip-level feature [6,16,17,18,35]. In [35], the authors extracts and aggregates the temporal and spatial information between consecutive frames simultaneously with one-stream Neural Network.…”

Section: Related Workmentioning

confidence: 99%

“…Liao et al [16] employed a succession of 3D convolution kernel pre-trained on kinetics to extract spatial and temporal features simultaneously from a video volume, which keeps the intra-clip consistency and learns the context information of local appearance patch. On the other hand, temporal alignment is a key point to temporal pooling performance [15,18,24]. Li et al [15] create a compact encoding of the video that exploits useful partial information in each frame.…”

Section: Related Workmentioning

confidence: 99%

“…Li et al [15] create a compact encoding of the video that exploits useful partial information in each frame. Liu et al [18] adopted historical appearance and motion context to search the missing parts and suppress noisy parts. Song et al [24] employ the high-quality region, which is predicted by region-based quality predictor part, to compensate the influence of an image region with poor quality.…”

Section: Related Workmentioning

confidence: 99%

“…The fourth [16] contributed to use a succession of "one-step" convolution kernel to extract the spatial-temporal features simultaneously from a video-volume. The last five methods [3,6,15,18,26] concentrate on aggregating intra-clip frames over temporal dimension to represent a clip-level feature. We can see that our best result achieves consistently superior performance over the recent state-of-the-art methods.…”

Section: Comparison With State-of-the-artsmentioning

confidence: 99%

See 3 more Smart Citations

Intra-clip Aggregation for Video Person Re-identification

Isobe¹,

Han²,

Zhu³

et al. 2019

Preprint

View full text Add to dashboard Cite

Video-based person re-id has drawn much attention in recent years due to its prospective applications in video surveillance. Most existing methods concentrate on how to represent discriminative clip-level features. Moreover, clip-level data augmentation is also important, especially for temporal aggregation task. Inconsistent intra-clip augmentation will collapse inter-frame alignment, thus bringing in additional noise. To tackle the above-motioned problems, we design a novel framework for video-based person re-id, which consists of two main modules: Synchronized Transformation (ST) and Intra-clip Aggregation (ICA). The former module augments intra-clip frames with the same probability and the same operation, while the latter leverages two-level intra-clip encoding to generate more discriminative clip-level features. To confirm the advantage of synchronized transformation, we conduct ablation study with different synchronized transformation scheme. We also perform cross-dataset experiment to better understand the generality of our method. Extensive experiments on three benchmark datasets demonstrate that our framework outperforming the most of recent state-of-the-art methods.

show abstract