2016 29th International Conference on VLSI Design and 2016 15th International Conference on Embedded Systems (VLSID) 2016
DOI: 10.1109/vlsid.2016.113
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Realization of Table Look-Up Based Double Precision Floating Point Arithmetic

Abstract: In this paper we present different optimization techniques on look-up table based algorithms for double precision floating point arithmetic. Based on our analysis of different look-up table based algorithms in the literature, we re-engineer basics blocks of the algorithms (i.e. multiplier(s) and adder(s)) to facilitate area and timing benefits to achieve higher performance. We propose different look-up table optimization techniques for the algorithms. We also analyze trade-off in employing exact rounding (0.5u… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
5
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 20 publications
0
5
0
Order By: Relevance
“…FPS has several resources to perform computations. In this exposition, we use carefully designed DOT4, a square root, and a divider for realization of DGEQR2, DGEQRF, DGEQR2HT, and DGEQRFHT routines [21] [22]. Logical place of arithmetic units is shown in the figure 12 and structure of DOT4 is shown in figure 13.…”
Section: Custom Realization Of Householder Transform and Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…FPS has several resources to perform computations. In this exposition, we use carefully designed DOT4, a square root, and a divider for realization of DGEQR2, DGEQRF, DGEQR2HT, and DGEQRFHT routines [21] [22]. Logical place of arithmetic units is shown in the figure 12 and structure of DOT4 is shown in figure 13.…”
Section: Custom Realization Of Householder Transform and Resultsmentioning
confidence: 99%
“…We show that sequential realization in PE and parallel realization of GGR based QR factorization in REDEFINE are scalable. Furthermore, it is shown that the speed-up in parallel realization in REDEFINE over sequential realization in PE is commensurate with the hardware resources employed in REDEFINE and the speed-up asymptotically approaches theoretical peak of REDEFINE CGRA For our implementations in PE and REDEFINE, we have used double precision Floating Point Unit (FPU) presented in [14] with recommendations presented in [15]. Organization of the papers is as follows: In section 2, we discuss about CGR, REDEFINE and some of the FPGA, multicore, and GPGPU based realizations of QR factorization.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…For our experiments, we use Processing Element (PE) design presented in [9]. We optimize Floating Point Unit (FPU) design presented in [13] with recommendations presented in [12] for optimum Instructions Per Cycle (IPC). The paper is organized as follows: In section 2, MFA, KF and REDEFINE are discussed.…”
Section: Introductionmentioning
confidence: 99%
“…Major reason for centralization of efforts toward software optimizations and efficient exploitation of memory hierarchy is mainly due to several architectural parameters that are not in the control of programmer [16]. For example, the depth of the pipeline (pipeline stages) in the underlying platform [17]. In this paper, we present a theoretical framework that assists in establishing a relation between pipeline depth of different floating point operations with size and type of the workload.…”
Section: Introductionmentioning
confidence: 99%