2019 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE) 2019
DOI: 10.23919/date.2019.8714922
|View full text |Cite
|
Sign up to set email alerts
|

CapsAcc: An Efficient Hardware Accelerator for CapsuleNets with Data Reuse

Abstract: Deep Neural Networks (DNNs) have been widely deployed for many Machine Learning applications. Recently, CapsuleNets have overtaken traditional DNNs, because of their improved generalization ability due to the multi-dimensional capsules, in contrast to the single-dimensional neurons. Consequently, CapsuleNets also require extremely intense matrix computations, making it a gigantic challenge to achieve high performance. In this paper, we propose CapsAcc, the first specialized CMOS-based hardware architecture to … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
23
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 26 publications
(23 citation statements)
references
References 12 publications
0
23
0
Order By: Relevance
“…Since the linear calculation deals with a huge amount of data, the optimal memory hierarchy is also different. So, increasing vendors are releasing their own specialized accelerators (Jouppi et al 2017;Zhang et al 2016;Marchisio et al 2019), and these accelerators have superior performance and energy efficiency in deep learning tasks, these specialized accelerators also have simpler and more diverse architectures than the general-purpose processor, as well as different memory subsystems. Meanwhile, in addition to these specialized-designed accelerators, increasing researchers focus on accelerator architecture with better universality for scalar, vector, matrix and tensor computation instead of only focusing on convolution (Guo et al 2020), which also bring challenges to compiler design.…”
Section: Deep Learning Acceleratormentioning
confidence: 99%
“…Since the linear calculation deals with a huge amount of data, the optimal memory hierarchy is also different. So, increasing vendors are releasing their own specialized accelerators (Jouppi et al 2017;Zhang et al 2016;Marchisio et al 2019), and these accelerators have superior performance and energy efficiency in deep learning tasks, these specialized accelerators also have simpler and more diverse architectures than the general-purpose processor, as well as different memory subsystems. Meanwhile, in addition to these specialized-designed accelerators, increasing researchers focus on accelerator architecture with better universality for scalar, vector, matrix and tensor computation instead of only focusing on convolution (Guo et al 2020), which also bring challenges to compiler design.…”
Section: Deep Learning Acceleratormentioning
confidence: 99%
“…To enable the use of DNNs in energy-/power-constraint scenarios as well as in high performance applications, several different hardware architectures for DNN acceleration have been proposed. While all the accelerators provide some unique features and support some specific dataflows in a more efficient manner, systolic array-based designs are considered among the promising ones [18,23,37,61].…”
Section: Hardware Accelerators For Deep Neural Networkmentioning
confidence: 99%
“…Moreover, the systolic arrays are intrinsically efficient at performing matrix multiplications, which is the core operation of neural networks. Therefore, many accelerators use these arrays at their core for accelerating the neural networks [18,23,37,61]. The Tensor Processing Unit (TPU), a DNN accelerator that is currently in use in the datacenters of Google, is a systolic array-based architecture that uses an array of 256 × 256 multiply-and-accumulate (MAC) units.…”
Section: Hardware Accelerators For Deep Neural Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…These challenges are addressed by their specialized accelerators. For example, CapsAcc [46] adopts a data reuse policy to efficiently process the routing-by-agreement algorithm on a systolyc array-based accelerator for CapsuleNets, and GANAX [76] propose a unified MIMD-SIMD design for concurrent execution of GANs.…”
Section: Current Trendsmentioning
confidence: 99%