2023
DOI: 10.1609/aaai.v37i2.25267
|View full text |Cite
|
Sign up to set email alerts
|

Token Mixing: Parameter-Efficient Transfer Learning from Image-Language to Video-Language

Abstract: Applying large scale pre-trained image-language model to video-language tasks has recently become a trend, which brings two challenges. One is how to effectively transfer knowledge from static images to dynamic videos, and the other is how to deal with the prohibitive cost of fully fine-tuning due to growing model size. Existing works that attempt to realize parameter-efficient image-language to video-language transfer learning can be categorized into two types: 1) appending a sequence of temporal transformer … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 39 publications
0
0
0
Order By: Relevance