Multi-Exit Vision Transformer for Dynamic Inference

Bakhtiarnia, Arian; Zhang, Qi; Iosifidis, Alexandros

doi:10.48550/arxiv.2106.15183

Cited by 2 publications

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The proposed U-DeepSC is able to deal with a number of tasks with three modalities, i.e., image, text, and speech, simultaneously. Since different tasks are with different difficulties and require different numbers of layers to achieve satisfactory performance, the multi-exit architecture is developed by inserting the early-exit modules after the intermediate layers of the decoder to provide early-exit results for relatively simple tasks [23], [24]. In addition, since more decoder layers are required to eliminate the impact of the larger channel noise, we involve the channel noise in the training procedure to enable U-DeepSC to achieve dynamic inferencing with adaptive layers under different channel conditions.…”

Section: B Motivation and Contributionsmentioning

confidence: 99%

A Unified Multi-Task Semantic Communication System for Multimodal Data

Zhang¹,

Hu²,

Qin³

et al. 2022

Preprint

View full text Add to dashboard Cite

Task-oriented semantic communication has achieved significant performance gains. However, the model has to be updated once the task is changed or multiple models need to be stored for serving different tasks. To address this issue, we develop a unified deep learning enabled semantic communication system (U-DeepSC), where a unified end-to-end framework can serve many different tasks with multiple modalities. As the difficulty varies from different tasks, different numbers of neural network layers are required for various tasks. We develop a multi-exit architecture in U-DeepSC to provide early-exit results for relatively simple tasks. To reduce the transmission overhead, we design a unified codebook for feature representation for serving multiple tasks, in which only the indices of these task-specific features in the codebook are transmitted. Moreover, we propose a dimension-wise dynamic scheme that can adjust the number of transmitted indices for different tasks as the number of required features varies from task to task. Furthermore, our dynamic scheme can adaptively adjust the numbers of transmitted features under different channel conditions to optimize the transmission efficiency. According to simulation results, the proposed U-DeepSC achieves comparable performance to the task-oriented semantic communication system designed for a specific task but with significant reduction in both transmission overhead and model size.

show abstract

Section: B Motivation and Contributionsmentioning

confidence: 99%

A Unified Multi-Task Semantic Communication System for Multimodal Data

Zhang¹,

Hu²,

Qin³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…More candidates does not improve the results. The positions of exit candidates for ResNet101 backbones are e i = 3i − 2, i ∈ [1,11], so the interval between exit candidates in ResNet101 is three blocks and the first exit candidate's position e 1 = 1. The positions of exit candidates for BERT-base backbone are e i = i, i ∈ [1,11].…”

Section: Task Type Classification Pose Estimation Semantic Segmentati...mentioning

confidence: 99%

“…In this case, the exit overhead directly affects the selection of exit structure and exit placement. There are some prior algorithm works on adding multiple exits on a DNN model for efficient inference [11,29,30,49,54]. The precision-latency tradeoff is also discussed in [33,34,43].…”

Section: Introductionmentioning

confidence: 99%

Pame

Zhang

Cui

Chen

et al. 2022

Proceedings of the 36th ACM International Conference on Supercomputing

View full text Add to dashboard Cite

In emerging DNN serving systems, queries are usually batched to fully leverage hardware resources, and all the queries in a batch run through the complete model and return at the same time. According to our findings, some queries only need to pass through a portion of the DNN model to attain sufficient precision in a DNN service. These queries can have shorter latencies if they can return early in the middle of a model. Therefore, we propose precision-aware multiexit inference serving, PAME, to achieve the above purpose. PAME provides a holistic scheme to build a multi-exit DNN model and a corresponding system-level design of the inference engine. We use representative CV and NLP benchmarks to evaluate PAME. PAME is adaptive to various DNN tasks and service loads. Experimental results show that PAME reduces 39.9% average latency without increasing the tail latency, while maintaining 99.68% precision of the original single-exit DNN models on average. CCS CONCEPTS• Computing methodologies → Artificial intelligence; • Computer systems organization → Real-time systems.

show abstract

Multi-Exit Vision Transformer for Dynamic Inference

Cited by 2 publications

References 18 publications

A Unified Multi-Task Semantic Communication System for Multimodal Data

A Unified Multi-Task Semantic Communication System for Multimodal Data

Pame

Contact Info

Product

Resources

About