2024
DOI: 10.1609/aaai.v38i5.28205
|View full text |Cite
|
Sign up to set email alerts
|

Unifying Visual and Vision-Language Tracking via Contrastive Learning

Yinchao Ma,
Yuyang Tang,
Wenfei Yang
et al.

Abstract: Single object tracking aims to locate the target object in a video sequence according to the state specified by different modal references, including the initial bounding box (BBOX), natural language (NL), or both (NL+BBOX). Due to the gap between different modalities, most existing trackers are designed for single or partial of these reference settings and overspecialize on the specific modality. Differently, we present a unified tracker called UVLTrack, which can simultaneously handle all three reference set… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2025
2025

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
references
References 29 publications
0
0
0
Order By: Relevance