2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021
DOI: 10.1109/cvprw53098.2021.00457
|View full text |Cite
|
Sign up to set email alerts
|

SBNet: Segmentation-based Network for Natural Language-based Vehicle Search

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…We compare our OMG with previous state-of-the-art methods in Table 3. It is shown that our Team MRR OMG(ours) 0.3012 Alibaba-UTS-ZJU [1] 0.1869 SDU-XidianU-SDJZU [38] 0.1613 SUNYKorea [33] 0.1594 Sun Asterisk [30] 0.1571 HCMUS [31] 0.1560 TUE [37] 0.1548 JHU-UMD [14] 0.1364 Modulabs-Naver-KookminU [15] 0.1195 Unimore [36] 0.1078…”
Section: Evaluation Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We compare our OMG with previous state-of-the-art methods in Table 3. It is shown that our Team MRR OMG(ours) 0.3012 Alibaba-UTS-ZJU [1] 0.1869 SDU-XidianU-SDJZU [38] 0.1613 SUNYKorea [33] 0.1594 Sun Asterisk [30] 0.1571 HCMUS [31] 0.1560 TUE [37] 0.1548 JHU-UMD [14] 0.1364 Modulabs-Naver-KookminU [15] 0.1195 Unimore [36] 0.1078…”
Section: Evaluation Resultsmentioning
confidence: 99%
“…AYCE [36] proposes a modular solution which applies BERT [41] to embed textual descriptions and a CNN [10] with a Transformer model [43] to embed visual information. SBNet [15] presents a substitution module that helps project features from different domains into the same space, and a future prediction module to learn temporal information by predicting the next frame. Pirazh et al [14] and Tam et al [30] adopts CLIP [35] to extract frame features and textual features.…”
Section: Text-based Vehicle Retrievalmentioning
confidence: 99%
“…In the 5th NVIDIA AI City Challenge, the majority of teams [2], [16] [17], [18] [19], [20] chose to extract sentence embeddings of the queries, whereas two teams [21], [22] processed the NL queries using conventional NLP techniques. For cross-modality learning, certain teams [20], [2] used ReID models with the adoption of vision models pre-trained on visual ReID data and language models pre-trained on the given queries from the dataset.…”
Section: Related Work a Natural Language-based Vehicle-based Video Re...mentioning
confidence: 99%
“…The motion of vehicles is an integral component of the NL descriptions. Consequently, a number of teams [2], [18], [22] have developed specific methods for measuring and representing vehicle motion patterns.…”
Section: Related Work a Natural Language-based Vehicle-based Video Re...mentioning
confidence: 99%