Tailin Liang scite author profile

Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant computation resources and energy costs. These challenges can be overcome through optimizations such as network compression. This paper provides a survey on two types of network compression: pruning and quantization. We compare current techniques, analyze their strengths and weaknesses, provide guidance for compressing networks, and discuss possible future compression techniques.

show abstract

Heterogeneous Edge CNN Hardware Accelerator

Moudgill¹,

Glossner

Chao-yang³

et al. 2020

View full text Add to dashboard Cite

TCX: A Programmable Tensor Processor

Liang

Wang

Shi

et al. 2022

View full text Add to dashboard Cite

TCX: A RISC Style Tensor Computing Extension and a Programmable Tensor Processor

Liang

Wang

Shi

et al. 2023

ACM Trans. Embed. Comput. Syst.

View full text Add to dashboard Cite

Neural network processors and accelerators are domain-specific architectures deployed to solve the high computational requirements of deep learning algorithms. This paper proposes a new instruction set extension for tensor computing - TCX, using RISC instructions enhanced with variable length tensor extensions. It features a multi-dimensional register file, dimension registers, and fully generic tensor instructions. It can be seamlessly integrated into existing RISC ISAs and provides software compatibility for scalable hardware implementations. We present a tensor accelerator implementation of the tensor extensions using an out-of-order RISC microarchitecture. The tensor accelerator is scalable in computation units from several hundred to tens of thousands. An optimized register renaming mechanism is described which allows for many physical tensor registers without requiring architectural support for large tensor register names. We describe new tensor load and store instructions that reduce bandwidth requirements using tensor dimension registers. Implementations may balance data bandwidth and computation utilization for different types of tensor computations such as element-wise, depth-wise, and matrix-multiplication. We characterize the computation precision of tensor operations to balance area, generality, and accuracy loss for several well-known neural networks. The TCX processor runs at 1 GHz and sustains 8.2 Tera operations per second using a 4096 multiply-accumulate compute unit. It consumes 12.8mm 2 while dissipating 0.46W/TOPs in TSMC 28nm technology.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tailin Liang

Pruning and quantization for deep neural network acceleration: A survey

Pruning and Quantization for Deep Neural Network Acceleration: A Survey

Heterogeneous Edge CNN Hardware Accelerator

TCX: A Programmable Tensor Processor

TCX: A RISC Style Tensor Computing Extension and a Programmable Tensor Processor

Contact Info

Product

Resources

About