2016
DOI: 10.1007/978-3-319-46478-7_9
|View full text |Cite
|
Sign up to set email alerts
|

A Siamese Long Short-Term Memory Architecture for Human Re-identification

Abstract: Abstract. Matching pedestrians across multiple camera views known as human re-identification (re-identification) is a challenging problem in visual surveillance. In the existing works concentrating on feature extraction, representations are formed locally and independent of other regions. We present a novel siamese Long Short-Term Memory (LSTM) architecture that can process image regions sequentially and enhance the discriminative capability of local feature representation by leveraging contextual information.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
320
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
6
2
2

Relationship

0
10

Authors

Journals

citations
Cited by 426 publications
(321 citation statements)
references
References 59 publications
(124 reference statements)
1
320
0
Order By: Relevance
“…In this part, we also compare the proposed method and several supervised methods on Market1501. The compared methods include three non-deep-learning-based methods (LOMO + XQDA [72], BoW+Kissme [4] and DNSL [51]) and thirteen deep-learning-based methods (PersonNet [55], Gate S-CNN [56], LSTM S-CNN [57], DGDropout [5], Deep-Embed [59], SpindleNet [18], Part-Aligned [15], PIE [17], JLML [45], MTMCT [58], SVDNet [54], PDC [16] and A 3 M [14]). Particularly, PIE, MTMCT, SVDNet and A 3 M use the same backbone network (i.e., ResNet-50 [64]) with our method.…”
Section: Comparison With Supervised Methodsmentioning
confidence: 99%
“…In this part, we also compare the proposed method and several supervised methods on Market1501. The compared methods include three non-deep-learning-based methods (LOMO + XQDA [72], BoW+Kissme [4] and DNSL [51]) and thirteen deep-learning-based methods (PersonNet [55], Gate S-CNN [56], LSTM S-CNN [57], DGDropout [5], Deep-Embed [59], SpindleNet [18], Part-Aligned [15], PIE [17], JLML [45], MTMCT [58], SVDNet [54], PDC [16] and A 3 M [14]). Particularly, PIE, MTMCT, SVDNet and A 3 M use the same backbone network (i.e., ResNet-50 [64]) with our method.…”
Section: Comparison With Supervised Methodsmentioning
confidence: 99%
“…In fact, CNNs are able to extract different features from a given image, representing them as a set of output maps avoiding manual effort in fea-ture engineering. Image-based Automatic Person Re-Identification is one of the fields in which CNNs achieved remarkable results [19,20,21,22,23,24].…”
Section: Related Workmentioning
confidence: 99%
“…It is possible to exploit the dependency among local regions by utilizing long short-term memory (LSTM) cells as the constituent components in the conventional Siamese network. Taking this intuition into account, in [33], the authors propose a LSTM-based architecture to process image regions sequentially and enhance the discriminative capability of local feature representation by leveraging contextual information. Following the same pipeline, the authors in [34] proposed a sequential fusion framework that combines the frame-wise appearance information as well as temporal information to generate a robust sequence-level human representation.…”
Section: Related Workmentioning
confidence: 99%