Shifting Capsule Networks from the Cloud to the Deep Edge

Costa, Miguel; Costa, Diogo; Gomes, Tiago; Pinto, Sandro

doi:10.48550/arxiv.2110.02911

Cited by 1 publication

(1 citation statement)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent advances already enable the inference pass of ML services on low-power microcontroller units (MCU) [27,43], opening a new aisle of smart applications, such as low-power image processing and segmentation, keyword spotting, and predictive maintenance [44]. The most recent Arm Cortex-M and RISC-V (RV32IMCXpulp) MCUs feature instruction set architectures (ISA) that support single instruction multiple data (SIMD) were tuned to speed up the inference pass of quantized artificial neural networks (ANNs) with minimal accuracy loss [45,46]. Supported by open-source libraries such as CMSIS-NN [17] and PULP-NN [47], porting a trained ANN to these families of MCUs is a feasible process.…”

Section: Introductionmentioning

confidence: 99%

Train Me If You Can: Decentralized Learning on the Deep Edge

Costa

Pinto

2022

Applied Sciences

View full text Add to dashboard Cite

The end of Moore’s Law aligned with data privacy concerns is forcing machine learning (ML) to shift from the cloud to the deep edge. In the next-generation ML systems, the inference and part of the training process will perform at the edge, while the cloud stays responsible for major updates. This new computing paradigm, called federated learning (FL), alleviates the cloud and network infrastructure while increasing data privacy. Recent advances empowered the inference pass of quantized artificial neural networks (ANNs) on Arm Cortex-M and RISC-V microcontroller units (MCUs). Nevertheless, the training remains confined to the cloud, imposing the transaction of high volumes of private data over a network and leading to unpredictable delays when ML applications attempt to adapt to adversarial environments. To fill this gap, we make the first attempt to evaluate the feasibility of ANN training in Arm Cortex-M MCUs. From the available optimization algorithms, stochastic gradient descent (SGD) has the best trade-off between accuracy, memory footprint, and latency. However, its original form and the variants available in the literature still do not fit the stringent requirements of Arm Cortex-M MCUs. We propose L-SGD, a lightweight implementation of SGD optimized for maximum speed and minimal memory footprint in this class of MCUs. We developed a floating-point version and another that operates over quantized weights. For a fully-connected ANN trained on the MNIST dataset, L-SGD (float-32) is 4.20× faster than the SGD while requiring only 2.80% of the memory with negligible accuracy loss. Results also show that quantized training is still unfeasible to train an ANN from the scratch but is a lightweight solution to perform minor model fixes and counteract the fairness problem in typical FL systems.

show abstract