MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

Yuan, Geng; Ma, Xiaolong; Niu, Wei; Li, Zhengang; Kong, Zhenglun; Liu, Ning; Gong, Yifan; Zhan, Zheng; He, Chaoyang; Jin, Qing; Wang, Siyue; Qin, Minghai; Ren, Bin; Wang, Yanzhi; Liu, Sijia; Lin, Xue

doi:10.48550/arxiv.2110.14032

Cited by 2 publications

(2 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Pattern-based pruning [50,59,78] alleviates the shortcomings of prior works by incorporating the benefits from fine-grained pruning while maintaining structures that can be exploited for hardware accelerations with the help of compiler. Pattern-based pruning is a combination of kernel pattern pruning and connectivity pruning as shown in Fig.…”

Section: Background and Related Workmentioning

confidence: 99%

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Gong

Yuan

Zhan

et al. 2022

ACM Trans. Des. Autom. Electron. Syst.

Self Cite

View full text Add to dashboard Cite

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this paper, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme mapping methods, one is search-based and the other is rule-based, are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48 × and 1.73 × DNN inference acceleration on CIFAR-10 and ImageNet dataset without accuracy loss.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Gong

Yuan

Zhan

et al. 2022

ACM Trans. Des. Autom. Electron. Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…To address this challenge, we envision that the lightweighted machine learning engine is needed to reduce the memory consumption for on-device training. For example, FedGKT [30], MEST [37], and TinyFL [38] are potential methods to reduce the training memory footprint for efficient on-device learning. To avoid catastrophic forgetting we also envision the use of clustering approaches that identify and store few core datasets from each time interval or leveraging IoT "Hubs" that can store non-sensitive/public datasets to inject memory in the training system.…”

Section: G Lifelong/continual Learningmentioning

confidence: 99%

Federated Learning for Internet of Things: Applications, Challenges, and Opportunities

Zhang¹,

Gao²,

He³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Billions of IoT devices will be deployed in the near future, taking advantage of the faster Internet speed and the possibility of orders of magnitude more endpoints brought by 5G/6G. With the blooming of IoT devices, vast quantities of data that may contain private information of users will be generated. The high communication and storage costs, mixed with privacy concerns, will increasingly be challenging the traditional ecosystem of centralized over-the-cloud learning and processing for IoT platforms. Federated Learning (FL) has emerged as the most promising alternative approach to this problem. In FL, training of data-driven machine learning models is an act of collaboration between multiple clients without requiring the data to be brought to a central point, hence alleviating communication and storage costs and providing a great degree of user-level privacy. We discuss the opportunities and challenges of FL for IoT platforms, as well as how it can enable future IoT applications.

show abstract

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

Cited by 2 publications

References 32 publications

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Federated Learning for Internet of Things: Applications, Challenges, and Opportunities

Contact Info

Product

Resources

About