2015 28th International Conference on VLSI Design 2015
DOI: 10.1109/vlsid.2015.31
|View full text |Cite
|
Sign up to set email alerts
|

Micro-architectural Enhancements in Distributed Memory CGRAs for LU and QR Factorizations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
3
3
1

Relationship

3
4

Authors

Journals

citations
Cited by 14 publications
(21 citation statements)
references
References 23 publications
0
21
0
Order By: Relevance
“…Based on our case studies in dgemm, dgeqrf, and dgetrf it can be inferred that the performance attained in the latest multicore and GPGPU is hardly 50-52% even for highly parallel operations like dgemm. Due to this shortcoming of multicore and GPGPU, we choose a customizable platform for our implementation presented in [9] that is capable of achieving up to 74% of the theoretical peak in dgemm [9] [14].…”
Section: Performance Evaluation Of Dgeqrf Dgetrf and Dgemm On Multicmentioning
confidence: 99%
See 1 more Smart Citation
“…Based on our case studies in dgemm, dgeqrf, and dgetrf it can be inferred that the performance attained in the latest multicore and GPGPU is hardly 50-52% even for highly parallel operations like dgemm. Due to this shortcoming of multicore and GPGPU, we choose a customizable platform for our implementation presented in [9] that is capable of achieving up to 74% of the theoretical peak in dgemm [9] [14].…”
Section: Performance Evaluation Of Dgeqrf Dgetrf and Dgemm On Multicmentioning
confidence: 99%
“…Coarse Grained Reconfigurable Architectures (CGRAs) have been active topic of research due to their power performance and flexibility [9]. CGRAs are capable of domain customization and they are targeted to achieve performance of Application Specific Integrated Circuits (ASICs) and flexibility of Field Programmable Gate Arrays (FPGAs) through presence of ASIC-like structures [14] [8].…”
Section: Introductionmentioning
confidence: 99%
“…The focus of [28] is on emulation of systolic schedule for GR on REDEFINE and hence synthesis of systolic array on REDEFINE. Other optimizations on REDE-FINE CGRAs are also attempted in [111,112], firstly targeting algorithmic optimization which is then exploited for CGRA-based acceleration.…”
Section: Cgrasmentioning
confidence: 99%
“…A recently enhanced REDEFINE CGRA for NLA [112] shows comparable values for 65nm, and shows better performance density if scaled to 45nm with custom DOT product units, however no execution times for large matrices could be found. Based on the reported latencies for 60×60 and 120×120 data size running matrix multiplication in [112], the slowest variant of Layers performs 4.2× and 2.8× faster on a 64×64 and 128×128 data set.…”
Section: Redefinementioning
confidence: 99%
“…Performance of the system involving such floating point operations and floating point functions depends on the attributes of the Floating Point Unit (FPU). Look-up table based methods are highly popular in realization of FPU due to their fast convergence to the end results [5] [6][7] [8]. These methods use binary adders, multipliers and priority encoders as basic building blocks along with pre-computed table look-ups for the computation of elementary operations and transcendental functions.…”
Section: Introductionmentioning
confidence: 99%