Improving Description-Based Person Re-Identification by Multi-Granularity Image-Text Alignments

Niu, Kai; Huang, Yan; Ouyang, Wanli; Wang, Liang

doi:10.1109/tip.2020.2984883

Cited by 146 publications

(82 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We compare the results with feature embedding ( [4, 7, 28]) and attention based methods ( [10,13,29,30]) in Table 2. The results show that our HAITA-Net achieves better performance.…”

Section: Performance Comparisonmentioning

confidence: 99%

“…The results show that our HAITA-Net achieves better performance. Although MIA [29] and A-GANet [10] also learn the cross-modal representations by global or local associations, they weakly incorporate the importance of the informative word. Further, compared to A-GANet, we observe that for higher CMC ranks our model has significantly better performance.…”

Section: Performance Comparisonmentioning

confidence: 99%

See 1 more Smart Citation

Hierarchical Attention Image-Text Alignment Network For Person Re-Identification

Kansal

Subramanyam

Wang³

et al. 2021

2021 IEEE International Conference on Multimedia &Amp; Expo Workshops (ICMEW)

View full text Add to dashboard Cite

“…We compare the results with feature embedding ( [4, 7, 28]) and attention based methods ( [10,13,29,30]) in Table 2. The results show that our HAITA-Net achieves better performance.…”

Section: Performance Comparisonmentioning

confidence: 99%

Section: Performance Comparisonmentioning

confidence: 99%

Hierarchical Attention Image-Text Alignment Network For Person Re-Identification

Kansal

Subramanyam

Wang³

et al. 2021

2021 IEEE International Conference on Multimedia &Amp; Expo Workshops (ICMEW)

View full text Add to dashboard Cite

“…ResNet50 [127] LSTM [130] GLIA [82] CUHK-PEDES [85] Top-1 acc. ResNet-50 [127] Bi-GRU [145,146] MIA [135] CUHK-PEDES [85] Recall@1 (48.00%) Zhang et al [136] 2018…”

Section: Natural Language Description-based Person Retrievalmentioning

confidence: 99%

“…The GLIA approach gains a significant boost to the top-1 accuracy from the baselines and achieves 43. Multi-granularity Image-text Alignments (MIA) framework [135] adopts a multiple granularities (i.e., global-global, global-local, and local-local alignments) based approach for better similarity evaluations between text and image. The global context of image and description matches global-global granularity.…”

Section: Natural Language Description-based Person Retrievalmentioning

confidence: 99%

Person Retrieval in Surveillance Using Textual Query: A Review

Galiyawala¹,

Raval²

2021

Preprint

View full text Add to dashboard Cite

Recent advancement of research in biometrics, computer vision, and natural language processing has discovered opportunities for person retrieval from surveillance videos using textual query. The prime objective of a surveillance system is to locate a person using a description, e.g., a short woman with a pink t-shirt and white skirt carrying a black purse. She has brown hair. Such a description contains attributes like gender, height, type of clothing, colour of clothing, hair colour, and accessories. Such attributes are formally known as soft biometrics. They help bridge the semantic gap between a human description and a machine as a textual query contains the person’s soft biometric attributes. It is also not feasible to manually search through huge volumes of surveillance footage to retrieve a specific person. Hence, automatic person retrieval using vision and language-based algorithms is becoming popular. In comparison to other state-of-the-art reviews, the contribution of the paper is as follows: 1. Recommends most discriminative soft biometrics for specific challenging conditions. 2. Integrates benchmark datasets and retrieval methods for objective performance evaluation. 3. A complete snapshot of techniques based on features, classifiers, number of soft biometric attributes, type of the deep neural networks, and performance measures. 4. The comprehensive coverage of person retrieval from handcrafted features based methods to end-to-end approaches based on natural language description.

show abstract

“…Furthermore, Li et al [21] have adopted deep filter pairing neural network for person re-identification, and Wang et al [40] have investigated a joint learning framework for re-id while unifying Single-Image Representation (SIR) and Cross-Image Representation (CIR) via convolutional neural networks. More recently, a multi-granularity image-text alignment has been proposed for person re-identification [31].…”

Section: Introductionmentioning

confidence: 99%

Person Re-Identification from different views based on dynamic linear combination of distances

Elaoud

Barhoumi

Drira

et al. 2021

Multimed Tools Appl

View full text Add to dashboard Cite

Person re-identification from videos taken by multiple cameras from different views is a very challenging problem that has attracted growing interest in last years. In fact, the same person from significant cross-view has different appearances from clothes change, illumination, and cluttered background. To deal with this issue, we use the skeleton information since it is not affected by appearance and pose variations. The skeleton as an input is projected on the Grassmann manifold in order to model the human motion as a trajectory. Then, we calculate the distance on the Grassmann manifold, in order to guarantee invariance against rotation, as well as local distances allowing to discriminate anthropometric for each person. The two distances are thereafter combined while defining dynamically the optimal combination weight for each person. Indeed, a machine learning process learns to predict the best weight for each person according to the rank metric of its re-identification results. Experimental results, using challenging 3D (IAS

show abstract

Improving Description-Based Person Re-Identification by Multi-Granularity Image-Text Alignments

Cited by 146 publications

References 51 publications

Hierarchical Attention Image-Text Alignment Network For Person Re-Identification

Hierarchical Attention Image-Text Alignment Network For Person Re-Identification

Person Retrieval in Surveillance Using Textual Query: A Review

Person Re-Identification from different views based on dynamic linear combination of distances

Contact Info

Product

Resources

About