Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2022
DOI: 10.24963/ijcai.2022/221
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Hash Naturally Sorts

Abstract: Transformer-based architectures with grid features represent the state-of-the-art in visual and language reasoning tasks, such as visual question answering and image-text matching. However, directly applying them to image captioning may result in spatial and fine-grained semantic information loss. Their applicability to image captioning is still largely under-explored. Towards this goal, we propose a simple yet effective method, Spatial- and Scale-aware Transformer (S2 Transformer) for image captioning. Specif… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2
2

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 0 publications
0
10
0
Order By: Relevance
“…The quality of the hash codes obtained in this way is not high, which will have an impact on the final retrieval results. This can also explain why NSH [36] boosts highly on single-label datasets, but the boost of MAP on multi-label datasets is not very obvious.…”
Section: Introductionmentioning
confidence: 99%
See 4 more Smart Citations
“…The quality of the hash codes obtained in this way is not high, which will have an impact on the final retrieval results. This can also explain why NSH [36] boosts highly on single-label datasets, but the boost of MAP on multi-label datasets is not very obvious.…”
Section: Introductionmentioning
confidence: 99%
“…Earlier studies rely heavily on artificial annotations, which makes it difficult to apply in real-world scenarios due to the high labor costs. As a result, unsupervised deep hashing [27,22,23,36] has gradually become the major research direction in this field, with the recent boom in unsupervised learning [3,13,4,26,2,12]. The key difficulty with unsupervised hash is that the ad-hoc encoding process does not extract the key information for hashing, precisely because of the lack of supervised information.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations