Accelerating divergent applications on SIMD architectures using neural networks

Grigorian, Beayna; Reinman, Glenn

doi:10.1109/iccd.2014.6974700

Cited by 20 publications

(4 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A floating-point unit (with reduced precision), or fixed-point unit, can be chosen carefully by a graphics processing unit (GPU) architecture to save power [9]. The branch divergence in single instruction multiple data (SIMD) architectures can be limited, or avoided, by introducing approximation at the cost of a small quality loss [10]; an approximation can be used to estimate the load values in a cache and avoid a miss latency. Other techniques include memorization approaches to reuse with similar functions or inputs [11] and memory access skipping [12].…”

Section: A Approximate Hardwarementioning

confidence: 99%

A Retrospective and Prospective View of Approximate Computing [Point of View}

2020

View full text Add to dashboard Cite

C omputing systems are conventionally designed to operate as accurately as possible. However, this trend faces severe technology challenges, such as power consumption, circuit reliability, and high performance. For nearly half a century, performance and power consumption of computing systems have been consistently improved by relying mostly on technology scaling. As per Dennard's scaling, the size of a transistor has been considerably shrunk and the supply voltage has been reduced over the years, such that circuits operate at higher frequencies but nearly at the same power dissipation level. However, as Dennard's scaling tends toward an end, it is difficult to further improve performance under the same

show abstract

Section: A Approximate Hardwarementioning

confidence: 99%

A Retrospective and Prospective View of Approximate Computing [Point of View}

2020

View full text Add to dashboard Cite

show abstract

“…Accelerator design for neural networks has become a major line of computer architecture research in recent years. A handful of prior work explored the design space of neural network acceleration, which can be categorized into ASICs [15], [16], [18]- [22], [26], [27], [30], [34], [37], [38], [41], [42], FPGA implementations [17], [28], [35], [36], [43], using unconventional devices for acceleration [29], [33], [40], and dataflow optimizations [16], [23]- [25], [31], [32], [39]. Most of these studies have focused on accelerator design and optimization of merely one specific type of convolutional as the most computeintensive operation in deep convolutional neural networks.…”

Section: Related Workmentioning

confidence: 99%

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks

Yazdanbakhsh

Samadi

Kim³

et al. 2018

2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)

View full text Add to dashboard Cite

Generative Adversarial Networks (GANs) are one of the most recent deep learning models that generate synthetic data from limited genuine datasets. GANs are on the frontier as further extension of deep learning into many domains (e.g., medicine, robotics, content synthesis) requires massive sets of labeled data that is generally either unavailable or prohibitively costly to collect. Although GANs are gaining prominence in various fields, there are no accelerators for these new models. In fact, GANs leverage a new operator, called transposed convolution, that exposes unique challenges for hardware acceleration. This operator first inserts zeros within the multidimensional input, then convolves a kernel over this expanded array to add information to the embedded zeros. Even though there is a convolution stage in this operator, the inserted zeros lead to underutilization of the compute resources when a conventional convolution accelerator is employed. We propose the GANAX architecture to alleviate the sources of inefficiency associated with the acceleration of GANs using conventional convolution accelerators, making the first GAN accelerator design possible. We propose a reorganization of the output computations to allocate compute rows with similar patterns of zeros to adjacent processing engines, which also avoids inconsequential multiply-adds on the zeros. This compulsory adjacency reclaims data reuse across these neighboring processing engines, which had otherwise diminished due to the inserted zeros. The reordering breaks the full SIMD execution model, which is prominent in convolution accelerators. Therefore, we propose a unified MIMD-SIMD design for GANAX that leverages repeated patterns in the computation to create distinct microprograms that execute concurrently in SIMD mode. The interleaving of MIMD and SIMD modes is performed at the granularity of single microprogrammed operation. To amortize the cost of MIMD execution, we propose a decoupling of data access from data processing in GANAX. This decoupling leads to a new design that breaks each processing engine to an access micro-engine and an execute micro-engine. The proposed architecture extends the concept of access-execute architectures to the finest granularity of computation for each individual operand. Evaluations with six GAN models shows, on average, 3.6× speedup and 3.1× energy savings over EYERISS without compromising the efficiency of conventional convolution accelerators. These benefits come with a mere ≈7.8% area increase. These results suggest that GANAX is an effective initial step that paves the way for accelerating the next generation of deep neural models.

show abstract

“…The tool-chain for generating neural network approximations profiles the original code kernels to extract input/output sets and uses this data to train multilayer perceptrons, similar to the Parrot Transformation [24] and Neuralizer [31] processes. The other tools responsible for synthesis, implementation, and simulation are described in Section IV-A.…”

Section: Software Infrastructurementioning

confidence: 99%

BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing

Grigorian¹,

Farahpour²,

Reinman³

2015

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)

Self Cite

View full text Add to dashboard Cite

Applications with large amounts of data, real-time constraints, ultra-low power requirements, and heavy computational complexity present significant challenges for modern computing systems, and often fall within the category of high performance computing (HPC). As such, computer architects have looked to high performance single instruction multiple data (SIMD) architectures, such as accelerator-rich platforms, for handling these workloads. However, since the results of these applications do not always require exact precision, approximate computing may also be leveraged. In this work, we introduce BRAINIAC, a heterogeneous platform that combines precise accelerators with neural-network-based approximate accelerators. These reconfigurable accelerators are leveraged in a multi-stage flow that begins with simple approximations and resorts to more complex ones as needed. We employ high-level, applicationspecific light-weight checks (LWCs) to throttle this multi-stage acceleration flow and reliably ensure user-specified accuracy at runtime. Evaluation of the performance and energy of our heterogeneous platform for error tolerance thresholds of 5%-25% demonstrates an average of 3× gain over computation that only includes precise acceleration, and 15×-35× gain over softwarebased computation.

show abstract

Accelerating divergent applications on SIMD architectures using neural networks

Cited by 20 publications

References 23 publications

A Retrospective and Prospective View of Approximate Computing [Point of View}

A Retrospective and Prospective View of Approximate Computing [Point of View}

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks

BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing

Contact Info

Product

Resources

About