Field Programmable Gate Arrays (FPGAs) are able to provide a high computational parallelism that can be exploited to achieve high performance improvements in intensive data processing problems. In this paper our efforts were directed towards developing a PC cluster based on nodes that use FPGAs as co-processors. The target application is a floating-point large dense matrix multiplication. Experimental results for just one node of the cluster, consisting of a Xilinx Virtex 5 VLX50T with a PCI interface, showed performance improvements compared with the Intel Core2 Quad at 2.66 GHz, achieving a speed-up of 1.19 times. Other analyses in terms of frequency variation and power dissipation have been made by considering different matrix sizes running in one node of the cluster. Recently, the platform has been updated for a powerful Gidel plaftorm, the PROCe III 260E. This new platform consists of 1 FPGA Stratix III per board. In this board, it is possible to allocate up to 40 MACs per FPGA, reaching an overall speed-up of approximately 11.2 per node of the cluster when compared with the same general-purpose processor. A full example is presented in this paper.
Recently, the manufactures of supercomputers have made use of FPGAs to accelerate scientific applications [16] [17]. Traditionally, the FPGAs were used only on non-scientific applications. The main reasons for this fact are: the floating-point computation complexity; the FPGA logic cells are not sufficient for the scientific cores implementation; the cores complexity prevents them to operate on high frequencies.Nowadays, the increase of specialized blocks availability in complex operations, as sum and multiplier blocks, implemented directly in FPGA and, the increase of internal RAM blocks (BRAMs) have made possible high performance systems that use FPGA as a processing element for scientific computation [2].These devices are used as co-processors that execute intensive computation. The emphasis of these architectures is the exploration of parallelism present on scientific computation operations and data reuse.In major of these applications, the scientific computation uses, in general, operations of big floating-point dense matrices, which are normally operated by MACs.In this work, we describe the architecture of an accumulative multiplier (MAC) in double precision floating-point, according to IEEE-754 standard and we propose the architecture of a multiplier of matrices that uses developed instances of the MACs and explores the reuse of data through the use of the BRAMs (Blocks of RAM internal to the FPGAs) of a Xilinx Virtex 4 LX200 FPGA. The synthesis results showed that the implemented MAC could reach a performance of 4GFLOPs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.