2022
DOI: 10.1007/s00530-022-00982-y
|View full text |Cite
|
Sign up to set email alerts
|

A multitask joint framework for real-time person search

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 42 publications
0
2
0
Order By: Relevance
“…To begin with, for feature representation of both 2D images and 3D models, a better backbone is always encouraged, which draws our attention to the trendy vision transformers (ViT) recently. It has proved to be a success in many relative computer vision and natural language processing (NLP) such as video event detection [16], pedestrian detection [17], person search [18,19], and text classification [20]. ViT takes the image patch or word embedding as a sequence of tokens, and applies the self-attention mechanism to capture the internal relationships thus obtaining strong feature representation connected with downstream tasks.…”
Section: Introductionmentioning
confidence: 99%
“…To begin with, for feature representation of both 2D images and 3D models, a better backbone is always encouraged, which draws our attention to the trendy vision transformers (ViT) recently. It has proved to be a success in many relative computer vision and natural language processing (NLP) such as video event detection [16], pedestrian detection [17], person search [18,19], and text classification [20]. ViT takes the image patch or word embedding as a sequence of tokens, and applies the self-attention mechanism to capture the internal relationships thus obtaining strong feature representation connected with downstream tasks.…”
Section: Introductionmentioning
confidence: 99%
“…There are numerous novel neural network architectures for lightweight YOLO algorithms, such as MobileNetV3 with the SE (squeeze excitation) attention mechanism and hand switch function, PP-LCNet with a similar structure, Ghost-Net, etc. [12][13][14][15][16][17]. B Ma et al proposed a Gaussian distance intersection over union (GDIoU) loss function and applied it to the YOLOV4 network [18], which increased the average precision by 7.37%.…”
Section: Introductionmentioning
confidence: 99%