2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.01474
|View full text |Cite
|
Sign up to set email alerts
|

TransReID: Transformer-based Object Re-Identification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
324
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 672 publications
(328 citation statements)
references
References 39 publications
3
324
1
Order By: Relevance
“…TTSR [Yang et al, 2020] restored the texture information of the image super-resolution result based on the transformer. TransReID [He et al, 2021] applied the transformer to the field of retrieval for the first time and achieved similar results with the CNN-based method.…”
Section: Transformer In Visionmentioning
confidence: 83%
See 1 more Smart Citation
“…TTSR [Yang et al, 2020] restored the texture information of the image super-resolution result based on the transformer. TransReID [He et al, 2021] applied the transformer to the field of retrieval for the first time and achieved similar results with the CNN-based method.…”
Section: Transformer In Visionmentioning
confidence: 83%
“…In addition, to express the global features we embed an additional embedding patch for the global feature representation. After passing through the Transformer Layer (The VitS used in this paper consists of a total of 8 self-attentive blocks, with the same structure as TransReid [He et al, 2021]), we get the features with the same dimension as the input, and the cls token is regarded as the final feature representation. The output is divided into two parts, one of which converts the 768-dimensional VitS output into a 512-dimensional feature vector via a fully-connected layer to calculate Triplet Loss, and the other part goes through a fully-connected layer on this 512-dimensional feature vector for classification.…”
Section: Benchmark Introductionmentioning
confidence: 99%
“…Some works learn the relationship between body parts. For example, Su et al [32] align image patches that contain the same body part; Xia et al [2] utilize Non-local module to explore the importance of different parts; He et al [16] adopt the Vision Transformer [9] for image-based ReID feature extraction. These methods show excellent performance on public datasets, such as Market-1501 [50] and MSMT17 [37].…”
Section: Person Re-identificationmentioning
confidence: 99%
“…It begins with the NPO (non-pedestrian occlusion) augmentation strategy that produces image pairs and occlusion masks. Following [13], we simply adopt Vision Transformer (ViT) [4] as the feature extractor. Position embeddings and a classification [cls] token are prepended to the input image.…”
Section: Feature Erasing and Diffusion Networkmentioning
confidence: 99%