2009 International Conference on Field Programmable Logic and Applications 2009
DOI: 10.1109/fpl.2009.5272287
|View full text |Cite
|
Sign up to set email alerts
|

A fast parallel matrix multiplication reconfigurable unit utilized in face recognitions systems

Abstract: In this paper we present a reconfigurable device which significantly improves the execution time of the most computational intensive functions of three of the most widely used face recognition algorithms; those tasks multiply very large dense matrices. The presented architecture utilizes numerous digital signal processing units (DSPs) organized in a parallel manner within a stateof-the-art FPGA device. In order to accelerate those functions we have implemented a "blocked" matrix multiplication algorithm which … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 11 publications
0
8
0
Order By: Relevance
“…They concluded that for the smaller data size the FPGA was faster and the GPU was faster at the larger data size. Sotiropoulos et al designed an FPGA matrix-matrix multiplication architecture [3] and compared its performance to a standard CPU implementation. This comparison was only for specifically sized matrices and did not discuss their CPU implementation.…”
Section: Related Workmentioning
confidence: 99%
“…They concluded that for the smaller data size the FPGA was faster and the GPU was faster at the larger data size. Sotiropoulos et al designed an FPGA matrix-matrix multiplication architecture [3] and compared its performance to a standard CPU implementation. This comparison was only for specifically sized matrices and did not discuss their CPU implementation.…”
Section: Related Workmentioning
confidence: 99%
“…To evaluate our approach, we compare the matrix-matrix multiplication against two existing approaches [45] and [46]. These approaches implement a blocked matrix multiplication algorithm with fixed-point arithmetic on FPGAs.…”
Section: Comparison With Existing Approachesmentioning
confidence: 99%
“…Compared to [45], our approach is up to four times faster, since our approach extracts higher parallelism by exploiting MapReduce and pipelining. Compared to [46], our approach is slower by a factor of 1.2 to 3.8, because Sotiropoulos and Papaefstathiou [46] use double buffering to pipeline data input/output with computation. As matrix sizes increase, time spent on data loading/unloading increases and thus the performance difference between our approach and [46] increases.…”
Section: Comparison With Existing Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…Some designs focus on implementing image processing operations which have not been accomplished on the FPGA platform yet (Kokufuta and Maruyama, 2009). New design methodologies of image processing algorithms are proposed (Plavec et al, 2009), while existing algorithms are accelerated by implementing computing intensive routines in FPGA resources (Sotiropoulos and Papaefstathiou, 2009). Comparison of speed-up factor for various implementation platforms, i.e., GPU, CPU (GPP) and the FPGA, is also considered (Asano et al, 2009;Claus et al, 2009).…”
Section: Introductionmentioning
confidence: 99%