Searching toward pareto-optimal device-aware neural architectures

Cheng, An-Chieh; Dong, Jin‐Dong; Hsu, Chi-Hung; Chang, Shu-Huan; Sun, Min; Chang, Shih-Chieh; Pan, Jia-Yu; Chen, Yuting; Wei, Wei; Juan, Da-Cheng

doi:10.1145/3240765.3243494

Cited by 28 publications

(19 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By default, both Auto-WEKA and AutoSklearn optimize for only one metric, such as the error rate or accuracy. MONAS and DPP-Net [36], on the other hand, are natural extensions that search and optimize for multiple device-agnostic and device-aware constraints, resulting in gradually better models for all optimization objectives. The outcome of this process are tuples of objective performances where we can select the ones that are Pareto-optimal, that is they are optimal at least in one of the objectives.…”

Section: Binary Classification With Traditional Machine Learning Methodsmentioning

confidence: 99%

“…Similar to MONAS, it optimizes device-related (e.g., memory usage) and device-agnostic (e.g., accuracy or model size) objectives. Both approaches were evaluated in [36], showing that both frameworks are effective and are able to achieves Pareto-optimality with respect to the given objectives. While both resource-aware optimization frameworks are closely related to the objectives of our research, an open source implementation for MONAS and DPP-Net was not available for evaluation and adaptation purposes.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Resource Usage and Performance Trade-offs for Machine Learning Models in Smart Environments

Preuveneers

Tsingenopoulos

Joosen

2020

Sensors

View full text Add to dashboard Cite

The application of artificial intelligence enhances the ability of sensor and networking technologies to realize smart systems that sense, monitor and automatically control our everyday environments. Intelligent systems and applications often automate decisions based on the outcome of certain machine learning models. They collaborate at an ever increasing scale, ranging from smart homes and smart factories to smart cities. The best performing machine learning model, its architecture and parameters for a given task are ideally automatically determined through a hyperparameter tuning process. At the same time, edge computing is an emerging distributed computing paradigm that aims to bring computation and data storage closer to the location where they are needed to save network bandwidth or reduce the latency of requests. The challenge we address in this work is that hyperparameter tuning does not take into consideration resource trade-offs when selecting the best model for deployment in smart environments. The most accurate model might be prohibitively expensive to computationally evaluate on a resource constrained node at the edge of the network. We propose a multi-objective optimization solution to find acceptable trade-offs between model accuracy and resource consumption to enable the deployment of machine learning models in resource constrained smart environments. We demonstrate the feasibility of our approach by means of an anomaly detection use case. Additionally, we evaluate the extent that transfer learning techniques can be applied to reduce the amount of training required by reusing previous models, parameters and trade-off points from similar settings.

show abstract

Section: Binary Classification With Traditional Machine Learning Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Resource Usage and Performance Trade-offs for Machine Learning Models in Smart Environments

Preuveneers

Tsingenopoulos

Joosen

2020

Sensors

View full text Add to dashboard Cite

show abstract

“…Neural Architecture Search (NAS) [26] is an emerging approach in which pruning and quantization get embedded into a global search where also the topological parameters of the ConvNet, e.g. number of layers, number of filters, connections between layers, etc., take part to the objective function,…”

Section: Neural Architecture Searchmentioning

confidence: 99%

Optimality Assessment of Memory-Bounded ConvNets Deployed on Resource-Constrained RISC Cores

2019

View full text Add to dashboard Cite

A cost-effective implementation of Convolutional Neural Nets on the mobile edge of the Internet-of-Things (IoT) requires smart optimizations to fit large models into memory-constrained cores. Reduction methods that use a joint combination of filter pruning and weight quantization have proven efficient in searching the compression that ensures minimum model size without accuracy loss. However, there exist other optimal configurations that stem from the memory constraint. The objective of this work is to make an assessment of such memory-bounded implementations and to show that most of them are centred on specific parameter settings that are found difficult to be implemented on a low-power RISC. Hence, the focus is on quantifying the distance to optimality of the closest implementations that instead can be actually deployed on hardware. The analysis is powered by a two-stage framework that efficiently explores the memory-accuracy space using a lightweight, hardware-conscious heuristic optimization. Results are collected from three realistic IoT tasks (Image Classification on CIFAR-10, Keyword Spotting on the Speech Commands Dataset, Facial Expression Recognition on Fer2013) run on RISC cores (Cortex-M by ARM) with few hundreds KB of on-chip RAM. INDEX TERMS Neural networks, Internet of Things, optimization methods, low power electronics.

show abstract

“…ConvNets are computing-and memory-intensive models that need aggressive compression to fit low-power CPUs. Along with custom compression pipelines based on quantization [4], pruning [5] and neural architecture search [6], a common trend today is to offer end-users a portfolio of pre-trained models with the same back-bone topology but variable size and hence, a different latency-accuracy trade-off [7][8][9][10][11]. One can pick the implementation that best fits the available computing architecture and the application requirements, therefore reducing the design time.…”

Section: Introductionmentioning

confidence: 99%

Efficacy of Topology Scaling for Temperature and Latency Constrained Embedded ConvNets

Peluso

Rizzo

Calimera

2020

JLPEA

View full text Add to dashboard Cite

Embedded Convolutional Neural Networks (ConvNets) are driving the evolution of ubiquitous systems that can sense and understand the environment autonomously. Due to their high complexity, aggressive compression is needed to meet the specifications of portable end-nodes. A variety of algorithmic optimizations are available today, from custom quantization and filter pruning to modular topology scaling, which enable fine-tuning of the hyperparameters and the right balance between quality, performance and resource usage. Nonetheless, the implementation of systems capable of sustaining continuous inference over a long period is still a primary source of concern since the limited thermal design power of general-purpose embedded CPUs prevents execution at maximum speed. Neglecting this aspect may result in substantial mismatches and the violation of the design constraints. The objective of this work was to assess topology scaling as a design knob to control the performance and the thermal stability of inference engines for image classification. To this aim, we built a characterization framework to inspect both the functional (accuracy) and non-functional (latency and temperature) metrics of two ConvNet models, MobileNet and MnasNet, ported onto a commercial low-power CPU, the ARM Cortex-A15. Our investigation reveals that different latency constraints can be met even under continuous inference, yet with a severe accuracy penalty forced by thermal constraints. Moreover, we empirically demonstrate that thermal behavior does not benefit from topology scaling as the on-chip temperature still reaches critical values affecting reliability and user satisfaction.

show abstract

Searching toward pareto-optimal device-aware neural architectures

Cited by 28 publications

References 35 publications

Resource Usage and Performance Trade-offs for Machine Learning Models in Smart Environments

Resource Usage and Performance Trade-offs for Machine Learning Models in Smart Environments

Optimality Assessment of Memory-Bounded ConvNets Deployed on Resource-Constrained RISC Cores

Efficacy of Topology Scaling for Temperature and Latency Constrained Embedded ConvNets

Contact Info

Product

Resources

About