2022
DOI: 10.48550/arxiv.2205.09613
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Abstract: Modern object detectors have taken the advantages of pre-trained vision transformers by using them as backbone networks. However, except for the backbone networks, other detector components, such as the detector head and the feature pyramid network, remain randomly initialized, which hinders the consistency between detectors and pre-trained models. In this study, we propose to integrally migrate the pre-trained transformer encoder-decoders (imTED) for object detection, constructing a feature extraction-operati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…Nevertheless, existing pre-trained Transformer models are capable of extracting meaningful features, which contributes to establishing a strong foundation for achieving impressive performance in oriented object detection tasks. Therefore, we adopt a design inspired by the imTED (Zhang et al 2022b) detector and substitute the backbone as well as head modules of the two-stage detector with Vision Transformer blocks pre-trained using the MAE method.…”
Section: Overviewmentioning
confidence: 99%
“…Nevertheless, existing pre-trained Transformer models are capable of extracting meaningful features, which contributes to establishing a strong foundation for achieving impressive performance in oriented object detection tasks. Therefore, we adopt a design inspired by the imTED (Zhang et al 2022b) detector and substitute the backbone as well as head modules of the two-stage detector with Vision Transformer blocks pre-trained using the MAE method.…”
Section: Overviewmentioning
confidence: 99%
“…A classical transfer-learning method, LSTD [ 11 ] is based on SSD [ 36 ] and Faster-RCNN [ 7 ] and trained under detection loss and regression loss to fine-tune the network. Furthermore, the following algorithms [ 15 , 16 , 17 , 44 ] further improve the accuracy with approaches like multiscale structure, contrastive learning, and so on.…”
Section: Related Workmentioning
confidence: 99%
“…Many FSOD approaches have been developed to improve the generalization ability of neural networks, which can be mainly divided into transfer-learning-based [ 11 , 12 , 13 , 14 , 15 , 16 , 17 ] and meta-learning-based [ 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 ] methods. The former type aims to train suitable network parameters for invariant representation across domains and focuses on how to freeze fewer components of the detector without performance degradation [ 26 , 27 ].…”
Section: Introductionmentioning
confidence: 99%
“…In terms of enhancing the classifier and regressor, Li et al [38] added a correction network in the support branch to refine the classification scores, and Huang et al [39] proposed a dynamic classifier and semi-explicit regressor to improve the generalizability. Most networks are based on Faster R-CNN [3]; specifically, the methods in [40,41] were improved based on DETR [42] and VIT [43]. The parameter-based methods have simple structures compared with other methods.…”
Section: Few-shot Object Detectionmentioning
confidence: 99%