Optimising Resource Management for Embedded Machine Learning

Lei, Xun; Tran-Thanh, Long; Al-Hashimi, Bashir M.; Merrett, Geoff V.

doi:10.23919/date48585.2020.9116235

Cited by 12 publications

(10 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These approaches produce a static DNN architecture with fixed parameters for the target application performance requirements based on the measurement on a fixed hardware resources. However, since available hardware resources dynamically change at runtime, performance requirements can be violated [19,20]. Fig 1 illustrates these problems by using experimental results from a Jetson Xavier NX, where bar A represents an optimized DNN model executing on all GPU cores at 1.1GHz to deliver a 50ms target latency.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms

Lou

Lei

Sabet

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Self Cite

View full text Add to dashboard Cite

Mobile and embedded platforms are increasingly required to efficiently execute computationally demanding DNNs across heterogeneous processing elements. At runtime, the available hardware resources to DNNs can vary considerably due to other concurrently running applications. The performance requirements of the applications could also change under different scenarios. To achieve the desired performance, dynamic DNNs have been proposed in which the number of channels/layers can be scaled in real time to meet different requirements under varying resource constraints. However, the training process of such dynamic DNNs can be costly, since platform-aware models of different deployment scenarios must be retrained to become dynamic. This paper proposes Dynamic-OFA, a novel dynamic DNN approach for state-of-the-art platform-aware NAS models (i.e. Once-for-all network (OFA)). Dynamic-OFA pre-samples a family of sub-networks from a static OFA backbone model, and contains a runtime manager to choose different sub-networks under different runtime environments. As such, Dynamic-OFA does not need the traditional dynamic DNN training pipeline. Compared to the state-of-the-art, our experimental results using ImageNet on a Jetson Xavier NX show that the approach is up to 3.5x (CPU), 2.4x (GPU) faster for similar Top-1 accuracy, or 3.8% (CPU), 5.1% (GPU) higher accuracy at similar latency.

show abstract

Section: Introductionmentioning

confidence: 99%

“…Since both software performance requirements and hardware resource availability can change dynamically at runtime [19,20], various dynamic DNNs [21,[23][24][25] have been proposed to address this issue. These dynamic DNNs contain various sub-networks that each have a different accuracy and latency.…”

Section: Introductionmentioning

confidence: 99%

Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms

Lou

Lei

Sabet

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Self Cite

View full text Add to dashboard Cite

show abstract

“…These approaches, which are static DNNs, provide an optimal Transformer architecture for the target application performance requirements based on the measurement of fixed hardware resources. However, embedded devices often run many applications on several heterogeneous cores, and the resources a Transformer was optimized for may not be available at run-time [3]. Therefore, the performance requirements of the application can be violated.…”

Section: Introductionmentioning

confidence: 99%

Dynamic Transformer for Efficient Machine Translation on Embedded Devices

Parry

Lei

Sabet

et al. 2021

2021 ACM/IEEE 3rd Workshop on Machine Learning for CAD (MLCAD)

Self Cite

View full text Add to dashboard Cite

The Transformer architecture is widely used for machine translation tasks. However, its resource-intensive nature makes it challenging to implement on constrained embedded devices, particularly where available hardware resources can vary at run-time. We propose a dynamic machine translation model that scales the Transformer architecture based on the available resources at any particular time. The proposed approach, 'Dynamic-HAT', uses a HAT SuperTransformer as the backbone to search for SubTransformers with different accuracylatency trade-offs at design time. The optimal SubTransformers are sampled from the SuperTransformer at run-time, depending on latency constraints. The Dynamic-HAT is tested on the Jetson Nano and the approach uses inherited SubTransformers sampled directly from the SuperTransformer with a switching time of <1s. Using inherited SubTransformers results in a BLEU score loss of 1.5% because the SubTransformer configuration is not retrained from scratch after sampling. However, to recover this loss in performance, the dimensions of the design space can be reduced to tailor it to a family of target hardware. The new reduced design space results in a BLEU score increase of approximately 1% for sub-optimal models from the original design space, with a wide range for performance scaling between 0.356s -1.526s for the GPU and 2.9s -7.31s for the CPU.

show abstract

“…Fig 1 illustrates these problems by using experimental results from a Jetson Xavier NX, where bar A represents an optimized DNN model executing on all GPU cores at 1.1GHz to deliver a 50ms target latency. However, under the same target latency, the optimization at design-time is invalid if the operating frequency changes or the DNN shares GPU cores with other applications at runtime, as shown by bars B, C Since both software performance requirements and hardware resource availability can change dynamically at runtime [19,20], various dynamic DNNs [21,[23][24][25] have been proposed to address this issue. These dynamic DNNs contain various sub-networks that each have a different accuracy and latency.…”

Section: Introductionmentioning

confidence: 99%

Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms

Lou¹,

Lei²,

Sabet³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Mobile and embedded platforms are increasingly required to efficiently execute computationally demanding DNNs across heterogeneous processing elements. At runtime, the available hardware resources to DNNs can vary considerably due to other concurrently running applications. The performance requirements of the applications could also change under different scenarios. To achieve the desired performance, dynamic DNNs have been proposed in which the number of channels/layers can be scaled in real time to meet different requirements under varying resource constraints. However, the training process of such dynamic DNNs can be costly, since platform-aware models of different deployment scenarios must be retrained to become dynamic. This paper proposes Dynamic-OFA, a novel dynamic DNN approach for state-of-the-art platform-aware NAS models (i.e. Once-for-all network (OFA)). Dynamic-OFA pre-samples a family of sub-networks from a static OFA backbone model, and contains a runtime manager to choose different sub-networks under different runtime environments. As such, Dynamic-OFA does not need the traditional dynamic DNN training pipeline. Compared to the state-of-the-art, our experimental results using ImageNet on a Jetson Xavier NX show that the approach is up to 3.5x (CPU), 2.4x (GPU) faster for similar ImageNet Top-1 accuracy, or 3.8% (CPU), 5.1% (GPU) higher accuracy at similar latency.

show abstract

Optimising Resource Management for Embedded Machine Learning

Cited by 12 publications

References 28 publications

Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms

Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms

Dynamic Transformer for Efficient Machine Translation on Embedded Devices

Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms

Contact Info

Product

Resources

About