Dancing along Battery: Enabling Transformer with Run-time Reconfigurability on Mobile Devices

Song, Yuhong; Jiang, Weiwen; Li, Bingbing; Qi, Panjie; Zhuge, Qingfeng; Sha, Edwin Hsing-Mean; Dasgupta, Sakyasingha; Shi, Yiyu; Ding, Caiwen

doi:10.1109/dac18074.2021.9586295

Cited by 14 publications

(3 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SMOF [8] put more effort into reducing kernel size and the number of filter channels to overcome fixed-size width constraints in SIMD units. For AutoML framework on edge devices, [9] combined hardware and software reconfiguration through reinforcement learning to explore a hybrid structured pruning for Transformer. Similarly, [10] designed an algorithm-hardware closed-loop framework to efficiently find the best device to deploy the given transformer model.…”

Section: Related Workmentioning

confidence: 99%

EVE: Environmental Adaptive Neural Network Models for Low-power Energy Harvesting System

Islam¹,

Zhou²,

Ran³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

IoT devices are increasingly being implemented with neural network models to enable smart applications. Energy harvesting (EH) technology that harvests energy from ambient environment is a promising alternative to batteries for powering those devices due to the low maintenance cost and wide availability of the energy sources. However, the power provided by the energy harvester is low and has an intrinsic drawback of instability since it varies with the ambient environment. This paper proposes EVE, an automated machine learning (autoML) co-exploration framework to search for desired multi-models with shared weights for energy harvesting IoT devices. Those shared models incur significantly reduced memory footprint with different levels of model sparsity, latency, and accuracy to adapt to the environmental changes . An efficient on-device implementation architecture is further developed to efficiently execute each model on device. A run-time model extraction algorithm is proposed that retrieves individual model with negligible overhead when a specific model mode is triggered. Experimental results show that the neural networks models generated by EVE is on average 2.5X times faster than the baseline models without pruning and shared weights.

show abstract

Section: Related Workmentioning

confidence: 99%

EVE: Environmental Adaptive Neural Network Models for Low-power Energy Harvesting System

Islam¹,

Zhou²,

Ran³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…TranCIM incorporates a Sparse Attention Scheduler (SAS) that dynamically configures the in-memory computing workload to accommodate various sparse attention patterns. Song et al 15 presented a two-level pruning to accommodate transformers on mobile devices, but their study ignores the underlying implementation. Qi et al 16 , on the other hand, presented an efficient acceleration system that combines balanced model compression with FPGA implementation optimization.…”

Section: Introductionmentioning

confidence: 99%

SAFMem: Accelerating Transformer Self-Attention Functionality via Memristor-Based Hardware

Bettayeb,

Halawani,

Khan

et al. 2023

Preprint

View full text Add to dashboard Cite

The adoption of transformer networks has experienced a notable surge in various AI applications. However, the increasedcomputational complexity, stemming primarily from the self-attention mechanism, parallels the manner in which convolutionoperations constrain the capabilities and speed of Convolutional Neural Networks (CNNs). The self-attention algorithm,specifically the Matrix-matrix Multiplication (MatMul) operations, demands a substantial amount of memory and computationalcomplexity, thereby restricting the overall performance of the transformer. This paper introduces an efficient hardwareaccelerator for the transformer network, leveraging memristor-based in-memory computing. The design targets the memorybottleneck associated with MatMul operations in the self-attention process, utilizing approximate analog computation andthe highly parallel computations facilitated by the memristor crossbar architecture. Remarkably, this approach resulted in areduction of approximately 10 times in the number of Multiply-Accumulate (MAC) operations in transformer networks, whilemaintaining 93.37% accuracy for the MNIST dataset, as validated by a comprehensive circuit simulator employing NeuroSim3.0. Simulation outcomes indicate an area utilization of 6895.7 μm2, a latency of 15.52 seconds, an energy consumption of3 mJ, and a leakage power of 59.55 μW . The methodology outlined in this paper represents a substantial stride towards ahardware-friendly transformer architecture for edge device, poised to achieve real-time performance.

show abstract

“…To implement these models efficiently on devices, various model compression, accelerator design, and hardware/software co-design techniques [5][6][7][8][9][10][11][12][13] have been proposed to achieve both high accuracy and efficiency. Unfortunately, most of the existing AI system designs only pursue high overall accuracy and ignore fairness among diverse groups in the dataset.…”

Section: Introductionmentioning

confidence: 99%

The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices

Sangbong¹,

Yang²,

Wu³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Along with the progress of AI democratization, neural networks are being deployed more frequently in edge devices for a wide range of applications. Fairness concerns gradually emerge in many applications, such as face recognition and mobile medical. One fundamental question arises: what will be the fairest neural architecture for edge devices? By examining the existing neural networks, we observe that larger networks typically are fairer. But, edge devices call for smaller neural architectures to meet hardware specifications. To address this challenge, this work proposes a novel Fairness-and Hardware-aware Neural architecture search framework, namely FaHaNa. Coupled with a model freezing approach, FaHaNa can efficiently search for neural networks with balanced fairness and accuracy, while guaranteed to meet hardware specifications. Results show that FaHaNa can identify a series of neural networks with higher fairness and accuracy on a dermatology dataset. Target edge devices, FaHaNa finds a neural architecture with slightly higher accuracy, 5.28× smaller size, 15.14% higher fairness score, compared with MobileNetV2; meanwhile, on Raspberry PI and Odroid XU-4, it achieves 5.75× and 5.79× speedup.

show abstract

Dancing along Battery: Enabling Transformer with Run-time Reconfigurability on Mobile Devices

Cited by 14 publications

References 20 publications

EVE: Environmental Adaptive Neural Network Models for Low-power Energy Harvesting System

EVE: Environmental Adaptive Neural Network Models for Low-power Energy Harvesting System

SAFMem: Accelerating Transformer Self-Attention Functionality via Memristor-Based Hardware

The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices

Contact Info

Product

Resources

About