2014
DOI: 10.1007/978-3-319-05960-0_13
|View full text |Cite
|
Sign up to set email alerts
|

Partitioning and Vectorizing Binary Applications for a Reconfigurable Vector Computer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 13 publications
0
4
0
Order By: Relevance
“…Yet, programmable hardware such as FPGAs, as a platform for custom-built accelerator designs [Kenter et al 2012[Kenter et al , 2014Strzodka and Goddeke 2006], can make effective use of all of these, but also entirely custom number formats. Developers can specify the number of exponent and mantissa bits and trade off precision against the amount of memory blocks required to store values and the number of logic elements required to perform arithmetic operations on them.…”
Section: Approximate Computingmentioning
confidence: 99%
“…Yet, programmable hardware such as FPGAs, as a platform for custom-built accelerator designs [Kenter et al 2012[Kenter et al , 2014Strzodka and Goddeke 2006], can make effective use of all of these, but also entirely custom number formats. Developers can specify the number of exponent and mantissa bits and trade off precision against the amount of memory blocks required to store values and the number of logic elements required to perform arithmetic operations on them.…”
Section: Approximate Computingmentioning
confidence: 99%
“…Convey includes a compiler to target this Vector Personality by annotating source code with pragmas; however, we found it to be limited to simple array data structures and simple loop nesting patterns, which often requires significant code adaptations besides adding the vectorization pragmas. We fixed many of these shortcomings with the toolflow proposed in [26]; however, for the comparison of architectural overheads of the overlay, we wanted to achieve the best possible performance. Therefore, for this work, we designed all kernels by hand in assembly code, particularly exploiting on top of the capabilities of the automated toolflow additional opportunities as vector partitioning, vector register rotation, and enhanced reuse of partially computed addresses.…”
Section: Convey Hc-1 Platform With Vector Processor Overlaymentioning
confidence: 99%
“…The work on instruction-set extensions is an example of partitioning usually limited to the migration to custom hardware of acyclic short sequences of instructions (see, e.g., [19]). In approaches where the RPU is loosely coupled to the GPP, as a co-processor, it is common to execute larger code sections (such as entire loops) [20][14][21] [22]. We briefly describe next the approaches most relevant to our work and present in TABLE I a summary of the reported speedups.…”
Section: Related Workmentioning
confidence: 99%
“…The binary is modified in order to add instructions for configuration and communication from/to DySER blocks. Another approach [22] maps loops in LLVM IR to a Vector Personality softcore for the Convey HC-1. At compile-time, a toolchain (including LLVM and Convey Compiler infrastructures) automatically identifies suitable loops (including outer-loops) for vectorization.…”
Section: Related Workmentioning
confidence: 99%