2022 IEEE International Symposium on Multimedia (ISM) 2022
DOI: 10.1109/ism55400.2022.00007
|View full text |Cite
|
Sign up to set email alerts
|

Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal Retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 21 publications
0
4
0
Order By: Relevance
“…Recent works on audio-visual retrieval tasks exploit supervised representation learning methods to generate new features across modalities in a common space [13], [14], [15], [16], [39], [40], [41], [42], such that the audio-visual features can be measured directly. Inspired by the C-CCA [39] that aims at finding linear transformations for each modality, C-DCCA [40] tries to learn non-linear features in the common space by using deep learning methods.…”
Section: ) Audio-visual Retrieval Taskmentioning
confidence: 99%
See 3 more Smart Citations
“…Recent works on audio-visual retrieval tasks exploit supervised representation learning methods to generate new features across modalities in a common space [13], [14], [15], [16], [39], [40], [41], [42], such that the audio-visual features can be measured directly. Inspired by the C-CCA [39] that aims at finding linear transformations for each modality, C-DCCA [40] tries to learn non-linear features in the common space by using deep learning methods.…”
Section: ) Audio-visual Retrieval Taskmentioning
confidence: 99%
“…Inspired by the C-CCA [39] that aims at finding linear transformations for each modality, C-DCCA [40] tries to learn non-linear features in the common space by using deep learning methods. Deep learning methods by using rank loss to optimize the predicted distances, such as TNN-C-CCA [13], and CCTL [16] models, which apply triplet losses as the objective functions to achieve better results than other CCA-variant methods. The EICS model [42] learns two different common spaces to capture modality-common and modality-specific features, which achieves the SOTA results so far.…”
Section: ) Audio-visual Retrieval Taskmentioning
confidence: 99%
See 2 more Smart Citations