2023
DOI: 10.1007/s10489-023-04669-3
|View full text |Cite
|
Sign up to set email alerts
|

TIAR: Text-Image-Audio Retrieval with weighted multimodal re-ranking

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 49 publications
0
1
0
Order By: Relevance
“…On the other hand, the single-stage task of image-speech involves using a self-supervised approach to learn large-scale image-speech pairs using visual and audio encoders separately (Li et al, 2023a;Reddy et al, 2021;Rodriguez et al, 2023;. Images and speech are embedded into a shared representation space to capture the interaction information between the two modalities (Chi et al, 2023;Chung et al, 2020;Huang et al, 2023b;Wang et al, 2024b).…”
Section: Introductionmentioning
confidence: 99%
“…On the other hand, the single-stage task of image-speech involves using a self-supervised approach to learn large-scale image-speech pairs using visual and audio encoders separately (Li et al, 2023a;Reddy et al, 2021;Rodriguez et al, 2023;. Images and speech are embedded into a shared representation space to capture the interaction information between the two modalities (Chi et al, 2023;Chung et al, 2020;Huang et al, 2023b;Wang et al, 2024b).…”
Section: Introductionmentioning
confidence: 99%