Human Activity Recognition on Microcontrollers with Quantized and Adaptive Deep Neural Networks

Daghero, Francesco; Burrello, Alessio; Xie, Chen; Castellano, Marco; Gandolfi, Luca; Calimera, Andrea; Macii, Enrico; Poncino, Massimo; Pagliari, Daniele Jahier

doi:10.1145/3542819

Cited by 19 publications

(7 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Dynamic (or adaptive) inference techniques, including this work, are designed to overcome these limitations. They allow the deployment of a single model able to adapt its complexity at runtime, while keeping the memory overhead under control [24]- [26]. In practice, a dynamic model can be partially turned off when the external conditions require it, or when the processing input's difficulty allows it [22].…”

Section: Static and Dynamic ML Optimizationsmentioning

confidence: 99%

“…Literature works differ mainly in how they decompose the model. For instance, the authors of [24]- [26], [28] obtain a single sub-model by selectively deactivating a subset of the layers or channels of a network, or truncating the bit-width used to represent parameters. Other works extend the approach more than two sub-models [25], [29] or enhance the stopping criterion with class-aware thresholds [22].…”

Section: A Dynamic Inferencementioning

confidence: 99%

See 1 more Smart Citation

Dynamic Decision Tree Ensembles for Energy-Efficient Inference on IoT Edge Nodes

Daghero

Burrello

Macii

et al. 2024

IEEE Internet Things J.

View full text Add to dashboard Cite

With the increasing popularity of Internet of Things (IoT) devices, there is a growing need for energy-efficient Machine Learning (ML) models that can run on constrained edge nodes. Decision tree ensembles, such as Random Forests (RFs) and Gradient Boosting (GBTs), are particularly suited for this task, given their relatively low complexity compared to other alternatives. However, their inference time and energy costs are still significant for edge hardware.Given that said costs grow linearly with the ensemble size, this paper proposes the use of dynamic ensembles, that adjust the number of executed trees based both on a latency/energy target and on the complexity of the processed input, to trade-off computational cost and accuracy. We focus on deploying these algorithms on multi-core low-power IoT devices, designing a tool that automatically converts a Python ensemble into optimized C code, and exploring several optimizations that account for the available parallelism and memory hierarchy. We extensively benchmark both static and dynamic RFs and GBTs on three state-of-the-art IoT-relevant datasets, using an 8-core ultra-lowpower System-on-Chip (SoC), GAP8, as the target platform. Thanks to the proposed early-stopping mechanisms, we achieve an energy reduction of up to 37.9% with respect to static GBTs (8.82 uJ vs 14.20 uJ per inference) and 41.7% with respect to static RFs (2.86 uJ vs 4.90 uJ per inference), without losing accuracy compared to the static model.

show abstract

Section: Static and Dynamic ML Optimizationsmentioning

confidence: 99%

Section: A Dynamic Inferencementioning

confidence: 99%

Dynamic Decision Tree Ensembles for Energy-Efficient Inference on IoT Edge Nodes

Daghero

Burrello

Macii

et al. 2024

IEEE Internet Things J.

View full text Add to dashboard Cite

show abstract

“… The encoding strategy used for the devised CNN architecture has reduced the search space of the architecture, which prevents the structure from expressing the diversity of the assigned tasks. [15] Human Activity Recognition A simple grid search algorithm has been explored how to optimize the models' hyper-parameters. To achieve a good trade-off between classification score and memory occupancy, a combination of sub-byte and mixed precision quantization was used.…”

Section: Related Workmentioning

confidence: 99%

“…In addition, practitioners of deep learning encounter difficulty when it comes to manually building deep models and determining suitable configurations (e.g., model layers and operation types) through trial and error. Various steps are involved in feeding domain knowledge into DL, including Feature Engineering (FE) [13] , model generation [14] , and model deployment [15] , [16] . Because CNNs are based on layers, they allow the flexibility of adding or removing layers based on the training phase, which is then used in inference (classification) to classify the data.…”

Section: Introductionmentioning

confidence: 99%

AUTO-HAR: An adaptive human activity recognition framework using an automated CNN architecture design

Ismail¹,

Alsalamah²,

Hassan³

et al. 2023

Heliyon

View full text Add to dashboard Cite

“…Daghero et al applied Binary Neural Networks (BNNs) to HAR to decrease network complexity via an extreme form of quantization [6]; indeed, by using BNNs the precision of data format, both weights and layers input/output, is reduced to 1-bit precision. Authors propose a BNN inference library that targets RISC-V processors.Subsequently, authors extended their work [4,5] by proposing a set of efficient one-dimensional convolutional neural networks (1D CNNs) and testing optimization techniques such as sub-type and mixed-precision quantization. The aim was to find a good trade-off between accuracy and memory occupation.…”

Section: Related Workmentioning

confidence: 99%

Energy Efficiency of Deep Learning Compression Techniques in Wearable Human Activity Recognition

Contoli

Lattanzi

2023

IFIP Advances in Information and Communication Technology

View full text Add to dashboard Cite

Deploying deep learning (DL) models onto low-power devices for Human Activity Recognition (HAR) purposes is gaining momentum because of the pervasive adoption of wearable sensor devices. However, the outcome of such deployment needs exploration not only because the topic is still in its infancy, but also because of the wide combination between low-power devices, deep models, and available deployment strategies. We have investigated the outcome of the application of three compression techniques, namely lite conversion, dynamic quantization, and full-integer quantization, that allow the deployment of deep models on low-power devices. This paper describes how those three compression techniques impact accuracy and energy consumption on an ESP32 device. In terms of accuracy, the full-integer technique incurs an accuracy drop between 2% and 3%, whereas the dynamic quantization and the lite conversion result in a negligible accuracy drop. In terms of power efficiency, dynamic and full-integer quantization allow for saving almost 30% of energy. The adoption of one of those two quantization techniques is recommended to obtain an executable network model, and we advise the adoption of the dynamic quantization given the negligible accuracy drop.

show abstract

Human Activity Recognition on Microcontrollers with Quantized and Adaptive Deep Neural Networks

Cited by 19 publications

References 46 publications

Dynamic Decision Tree Ensembles for Energy-Efficient Inference on IoT Edge Nodes

Dynamic Decision Tree Ensembles for Energy-Efficient Inference on IoT Edge Nodes

AUTO-HAR: An adaptive human activity recognition framework using an automated CNN architecture design

Energy Efficiency of Deep Learning Compression Techniques in Wearable Human Activity Recognition

Contact Info

Product

Resources

About