CapsAcc: An Efficient Hardware Accelerator for CapsuleNets with Data Reuse

Marchisio, Alberto; Hanif, Muhammad Abdullah; Shafique, Muhammad

doi:10.23919/date.2019.8714922

Cited by 26 publications

(23 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since the linear calculation deals with a huge amount of data, the optimal memory hierarchy is also different. So, increasing vendors are releasing their own specialized accelerators (Jouppi et al 2017;Zhang et al 2016;Marchisio et al 2019), and these accelerators have superior performance and energy efficiency in deep learning tasks, these specialized accelerators also have simpler and more diverse architectures than the general-purpose processor, as well as different memory subsystems. Meanwhile, in addition to these specialized-designed accelerators, increasing researchers focus on accelerator architecture with better universality for scalar, vector, matrix and tensor computation instead of only focusing on convolution (Guo et al 2020), which also bring challenges to compiler design.…”

Section: Deep Learning Acceleratormentioning

confidence: 99%

Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator

Liu

Leng

et al. 2020

CCF Trans. HPC

View full text Add to dashboard Cite

Specialized hardware accelerators for deep learning are widely introduced by many hardware vendors because of their high performance and efficiency. However, different vendors adopt different accelerator architectures, making it challenging for the compiler tool-chain to generate and optimize high-performance codes. Moreover, the current tool-chains provided by the vendors are either highly abstract, which makes it hard to optimize or contain too many hardware-related details, which makes it inconvenient to program. So, in this paper, we propose a middle layer compiler tool-chain for Cambricon MLU-100 to fill the gap between high-level runtime library and low operator-level SDK. Our tool-chain is based on the operator level SDK but abstracts away its redundant initialization and allocation statement. We also expose the interface of major optimization knobs compared to the existing runtime, thus enabling a considerable optimization space. We evaluate our work by several state-of-the-art neural networks and choose the line of code and optimization knobs as evaluation metrics. We also compare the performance against state-of-the-art tool-chain TensorRT applying simple optimization strategy and find that our work has great potential in optimization. Our work can guarantee the user a vast optimization space with only around 20% amount of the codes that hides the redundant initialization and allocation statements from users.

show abstract

Section: Deep Learning Acceleratormentioning

confidence: 99%

Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator

Liu

Leng

et al. 2020

CCF Trans. HPC

View full text Add to dashboard Cite

show abstract

“…To enable the use of DNNs in energy-/power-constraint scenarios as well as in high performance applications, several different hardware architectures for DNN acceleration have been proposed. While all the accelerators provide some unique features and support some specific dataflows in a more efficient manner, systolic array-based designs are considered among the promising ones [18,23,37,61].…”

Section: Hardware Accelerators For Deep Neural Networkmentioning

confidence: 99%

“…Moreover, the systolic arrays are intrinsically efficient at performing matrix multiplications, which is the core operation of neural networks. Therefore, many accelerators use these arrays at their core for accelerating the neural networks [18,23,37,61]. The Tensor Processing Unit (TPU), a DNN accelerator that is currently in use in the datacenters of Google, is a systolic array-based architecture that uses an array of 256 × 256 multiply-and-accumulate (MAC) units.…”

Section: Hardware Accelerators For Deep Neural Networkmentioning

confidence: 99%

“…intensive and also require high memory resources [53]. Current research mainly focuses on the development of less computationally intensive and resource-efficient DNNs that can offer high accuracy, and energy and performance efficient DNN accelerators for ML-based applications [1,18,23,29,34,36,37,44,53]. However, when considered for safety-critical applications, the robustness of these DNN-based systems to different reliability and security vulnerabilities also becomes one of the foremost objectives.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Robust Computing for Machine Learning-Based Systems

Hanif

Khalid

Putra

et al. 2020

Dependable Embedded Systems

Self Cite

View full text Add to dashboard Cite

The drive for automation and constant monitoring has led to rapid development in the field of Machine Learning (ML). The high accuracy offered by the state-of-the-art ML algorithms like Deep Neural Networks (DNNs) has paved the way for these algorithms to being used even in the emerging safety-critical applications, e.g., autonomous driving and smart healthcare. However, these applications require assurance about the functionality of the underlying systems/algorithms. Therefore, the robustness of these ML algorithms to different reliability and security threats has to be thoroughly studied and mechanisms/methodologies have to be designed which result in increased inherent resilience of these ML algorithms. Since traditional reliability measures like spatial and temporal redundancy are costly, they may not be feasible for DNN-based ML systems which are already super computer and memory intensive. Hence, new robustness methods for ML systems are required. Towards this, in this chapter, we present our analyses illustrating the impact of different reliability and security vulnerabilities on the accuracy of DNNs. We also discuss techniques that can be employed to design ML algorithms such that they are inherently resilient to reliability and security threats. Towards the end, the chapter provides open research challenges and further research opportunities.

show abstract

“…These challenges are addressed by their specialized accelerators. For example, CapsAcc [46] adopts a data reuse policy to efficiently process the routing-by-agreement algorithm on a systolyc array-based accelerator for CapsuleNets, and GANAX [76] propose a unified MIMD-SIMD design for concurrent execution of GANs.…”

Section: Current Trendsmentioning

confidence: 99%

Deep Learning for Edge Computing: Current Trends, Cross-Layer Optimizations, and Open Research Challenges

Marchisio

Hanif

Khalid

et al. 2019

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Self Cite

View full text Add to dashboard Cite

In the Machine Learning era, Deep Neural Networks (DNNs) have taken the spotlight, due to their unmatchable performance in several applications, such as image processing, computer vision, and natural language processing. However, as DNNs grow in their complexity, their associated energy consumption becomes a challenging problem. Such challenge heightens for edge computing, where the computing devices are resource-constrained while operating on limited energy budget. Therefore, specialized optimizations for deep learning have to be performed at both software and hardware levels. In this paper, we comprehensively survey the current trends of such optimizations and discuss key open research mid-term and long-term challenges.

show abstract

CapsAcc: An Efficient Hardware Accelerator for CapsuleNets with Data Reuse

Cited by 26 publications

References 12 publications

Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator

Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator

Robust Computing for Machine Learning-Based Systems

Deep Learning for Edge Computing: Current Trends, Cross-Layer Optimizations, and Open Research Challenges

Contact Info

Product

Resources

About