Energy-efficient Amortized Inference with Cascaded Deep Classifiers

Guan, Jun; Liu, Yang; Liu, Qiang; Peng, Jian

doi:10.24963/ijcai.2018/302

Cited by 12 publications

(10 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…3 (b)), where early features can be propagated to deep layers if needed. Based on this such architecture design, early exiting can be achieved according to confidence-based criteria [43], [48] or learned decision functions [44], [49], [50], [51]. Note that the confidence-based exiting policy consumes no extra computation during inference, while usually requiring tuning the threshold(s) on the validation set.…”

Section: Dynamic Depthmentioning

confidence: 99%

Dynamic Neural Networks: A Survey

Han

Huang

Song

et al. 2021

Preprint

View full text Add to dashboard Cite

Dynamic neural network is an emerging research topic in deep learning. Compared to static models which have fixed computational graphs and parameters at the inference stage, dynamic networks can adapt their structures or parameters to different inputs, leading to notable advantages in terms of accuracy, computational efficiency, adaptiveness, etc. In this survey, we comprehensively review this rapidly developing area by dividing dynamic networks into three main categories: 1) instance-wise dynamic models that process each instance with data-dependent architectures or parameters; 2) spatial-wise dynamic networks that conduct adaptive computation with respect to different spatial locations of image data and 3) temporal-wise dynamic models that perform adaptive inference along the temporal dimension for sequential data such as videos and texts. The important research problems of dynamic networks, e.g., architecture design, decision making scheme, optimization technique and applications, are reviewed systematically. Finally, we discuss the open problems in this field together with interesting future research directions.

show abstract

Section: Dynamic Depthmentioning

confidence: 99%

Dynamic Neural Networks: A Survey

Han

Huang

Song

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In the Social Media, Video Monitoring, and TF Cascade pipelines, a subset of models are invoked based on the output of earlier models in the pipeline. This conditional evaluation pattern appears in bandit algorithms [3,20] used for model personalization as well as more general cascaded prediction pipelines [2,14,24,34].…”

Section: Background and Motivationmentioning

confidence: 96%

InferLine

Crankshaw

Sela

et al. 2020

Proceedings of the 11th ACM Symposium on Cloud Computing

View full text Add to dashboard Cite

Serving ML prediction pipelines spanning multiple models and hardware accelerators is a key challenge in production machine learning. Optimally configuring these pipelines to meet tight end-to-end latency goals is complicated by the interaction between model batch size, the choice of hardware accelerator, and variation in the query arrival process. In this paper we introduce InferLine, a system which provisions and manages the individual stages of prediction pipelines to meet end-to-end tail latency constraints while minimizing cost. InferLine consists of a low-frequency combinatorial planner and a high-frequency auto-scaling tuner. The low-frequency planner leverages stage-wise profiling, discrete event simulation, and constrained combinatorial search to automatically select hardware type, replication, and batching parameters for each stage in the pipeline. The high-frequency tuner uses network calculus to auto-scale each stage to meet tail latency goals in response to changes in the query arrival process. We demonstrate that InferLine outperforms existing approaches by up to 7.6x in cost while achieving up to 34.5x lower latency SLO miss rate on realistic workloads and generalizes across state-of-the-art model serving frameworks.

show abstract

“…Dynamic inference (DI) predicts different samples using data-dependent architectures or parameters, thereby improving the inference efficiency or the model's representa- tion power [10]. Specifically, early existing methods allow samples (easy to classify) to be predicted using the early outputs of cascade DNNs [33] or networks with multiple intermediate classifiers [8]. Moreover, skipping methods selectively activate the model components, e.g., layers [9], branches [23], or sub-networks [2] conditioned on the sample.…”

Section: Related Workmentioning

confidence: 99%

Boost Test-Time Performance with Closed-Loop Inference

Niu¹,

Wu²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Conventional deep models predict a test sample with a single forward propagation, which, however, may not be sufficient for predicting hard-classified samples. On the contrary, we human beings may need to carefully check the sample many times before making a final decision. During the recheck process, one may refine/adjust the prediction by referring to related samples. Motivated by this, we propose to predict those hard-classified test samples in a looped manner to boost the model performance. However, this idea may pose a critical challenge: how to construct looped inference, so that the original erroneous predictions on these hard test samples can be corrected with little additional effort. To address this, we propose a general Closed-Loop Inference (CLI) method. Specifically, we first devise a filtering criterion to identify those hard-classified test samples that need additional inference loops. For each hard sample, we construct an additional auxiliary learning task based on its original top-K predictions to calibrate the model, and then use the calibrated model to obtain the final prediction. Promising results on ImageNet (in-distribution test samples) and ImageNet-C (out-of-distribution test samples) demonstrate the effectiveness of CLI in improving the performance of any pre-trained model.

show abstract

Energy-efficient Amortized Inference with Cascaded Deep Classifiers

Cited by 12 publications

References 6 publications

Dynamic Neural Networks: A Survey

Dynamic Neural Networks: A Survey

InferLine

Boost Test-Time Performance with Closed-Loop Inference

Contact Info

Product

Resources

About