2021
DOI: 10.48550/arxiv.2106.15183
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-Exit Vision Transformer for Dynamic Inference

Abstract: Deep neural networks can be converted to multiexit architectures by inserting early exit branches after some of their intermediate layers. This allows their inference process to become dynamic, which is useful for time critical IoT applications with stringent latency requirements, but with time-variant communication and computation resources. In particular, in edge computing systems and IoT networks where the exact computation time budget is variable and not known beforehand. Vision Transformer is a recently p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…The proposed U-DeepSC is able to deal with a number of tasks with three modalities, i.e., image, text, and speech, simultaneously. Since different tasks are with different difficulties and require different numbers of layers to achieve satisfactory performance, the multi-exit architecture is developed by inserting the early-exit modules after the intermediate layers of the decoder to provide early-exit results for relatively simple tasks [23], [24]. In addition, since more decoder layers are required to eliminate the impact of the larger channel noise, we involve the channel noise in the training procedure to enable U-DeepSC to achieve dynamic inferencing with adaptive layers under different channel conditions.…”
Section: B Motivation and Contributionsmentioning
confidence: 99%
“…The proposed U-DeepSC is able to deal with a number of tasks with three modalities, i.e., image, text, and speech, simultaneously. Since different tasks are with different difficulties and require different numbers of layers to achieve satisfactory performance, the multi-exit architecture is developed by inserting the early-exit modules after the intermediate layers of the decoder to provide early-exit results for relatively simple tasks [23], [24]. In addition, since more decoder layers are required to eliminate the impact of the larger channel noise, we involve the channel noise in the training procedure to enable U-DeepSC to achieve dynamic inferencing with adaptive layers under different channel conditions.…”
Section: B Motivation and Contributionsmentioning
confidence: 99%
“…More candidates does not improve the results. The positions of exit candidates for ResNet101 backbones are e i = 3i − 2, i ∈ [1,11], so the interval between exit candidates in ResNet101 is three blocks and the first exit candidate's position e 1 = 1. The positions of exit candidates for BERT-base backbone are e i = i, i ∈ [1,11].…”
Section: Task Type Classification Pose Estimation Semantic Segmentati...mentioning
confidence: 99%
“…In this case, the exit overhead directly affects the selection of exit structure and exit placement. There are some prior algorithm works on adding multiple exits on a DNN model for efficient inference [11,29,30,49,54]. The precision-latency tradeoff is also discussed in [33,34,43].…”
Section: Introductionmentioning
confidence: 99%