2014 IEEE 32nd International Conference on Computer Design (ICCD) 2014
DOI: 10.1109/iccd.2014.6974700
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating divergent applications on SIMD architectures using neural networks

Abstract: In this work, we investigate neural-network-based solutions to the well-known problem of branch divergence in Single Instruction Multiple Data (SIMD) architectures. Our approach isolates code regions with performance degradation due to branch divergence, trains neural networks (NNs) offline to approximate these regions, and replaces the regions with their NN approximations. By directly manipulating source code, this platform-agnostic methodology translates control flow into nondivergent computation, trading-of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 20 publications
(4 citation statements)
references
References 23 publications
0
4
0
Order By: Relevance
“…A floating-point unit (with reduced precision), or fixed-point unit, can be chosen carefully by a graphics processing unit (GPU) architecture to save power [9]. The branch divergence in single instruction multiple data (SIMD) architectures can be limited, or avoided, by introducing approximation at the cost of a small quality loss [10]; an approximation can be used to estimate the load values in a cache and avoid a miss latency. Other techniques include memorization approaches to reuse with similar functions or inputs [11] and memory access skipping [12].…”
Section: A Approximate Hardwarementioning
confidence: 99%
“…A floating-point unit (with reduced precision), or fixed-point unit, can be chosen carefully by a graphics processing unit (GPU) architecture to save power [9]. The branch divergence in single instruction multiple data (SIMD) architectures can be limited, or avoided, by introducing approximation at the cost of a small quality loss [10]; an approximation can be used to estimate the load values in a cache and avoid a miss latency. Other techniques include memorization approaches to reuse with similar functions or inputs [11] and memory access skipping [12].…”
Section: A Approximate Hardwarementioning
confidence: 99%
“…Accelerator design for neural networks has become a major line of computer architecture research in recent years. A handful of prior work explored the design space of neural network acceleration, which can be categorized into ASICs [15], [16], [18]- [22], [26], [27], [30], [34], [37], [38], [41], [42], FPGA implementations [17], [28], [35], [36], [43], using unconventional devices for acceleration [29], [33], [40], and dataflow optimizations [16], [23]- [25], [31], [32], [39]. Most of these studies have focused on accelerator design and optimization of merely one specific type of convolutional as the most computeintensive operation in deep convolutional neural networks.…”
Section: Related Workmentioning
confidence: 99%
“…The tool-chain for generating neural network approximations profiles the original code kernels to extract input/output sets and uses this data to train multilayer perceptrons, similar to the Parrot Transformation [24] and Neuralizer [31] processes. The other tools responsible for synthesis, implementation, and simulation are described in Section IV-A.…”
Section: Software Infrastructurementioning
confidence: 99%