Implementation of a double-precision multiplier accumulator with exception treatment to a dense matrix multiplier module in FPGA

Barros, Abner C.; Medeiros, Victor Wanderley Costa de; Souza, Victor L. F.; Nascimento, Paulo Sérgio B.; Mazer, Ângelo; Barbosa, João Paulo; Neves, Bruno P.; Santos, Ismael; Lima, Maria Luíza Carvalho de

doi:10.1145/1404371.1404392

Cited by 3 publications

(5 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The MAC has a pipeline with 33 stages. Since the data reuse strategy proposed in [4] substantially reduced the data access bottleneck, the number of MACs that can be instantiated in the FPGA is limited by the number of DSP blocks in the FPGA [5].…”

Section: B the Architecturementioning

confidence: 99%

An FPGA-Based Accelerator to Speed-Up Matrix Multiplication of Floating Point Operations

Holanda

Pimentel

Barbosa

et al. 2011

2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PHD Forum

View full text Add to dashboard Cite

Field Programmable Gate Arrays (FPGAs) are able to provide a high computational parallelism that can be exploited to achieve high performance improvements in intensive data processing problems. In this paper our efforts were directed towards developing a PC cluster based on nodes that use FPGAs as co-processors. The target application is a floating-point large dense matrix multiplication. Experimental results for just one node of the cluster, consisting of a Xilinx Virtex 5 VLX50T with a PCI interface, showed performance improvements compared with the Intel Core2 Quad at 2.66 GHz, achieving a speed-up of 1.19 times. Other analyses in terms of frequency variation and power dissipation have been made by considering different matrix sizes running in one node of the cluster. Recently, the platform has been updated for a powerful Gidel plaftorm, the PROCe III 260E. This new platform consists of 1 FPGA Stratix III per board. In this board, it is possible to allocate up to 40 MACs per FPGA, reaching an overall speed-up of approximately 11.2 per node of the cluster when compared with the same general-purpose processor. A full example is presented in this paper.

show abstract

Section: B the Architecturementioning

confidence: 99%

An FPGA-Based Accelerator to Speed-Up Matrix Multiplication of Floating Point Operations

Holanda

Pimentel

Barbosa

et al. 2011

2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PHD Forum

View full text Add to dashboard Cite

show abstract

“…Hence, the order of matrices operated have to meet the condition The second consideration is the compromise between the number of MACs (which is limited by the amount of DSPs blocks in the FPGA) [17] and the memory bandwidth available in the architecture, given by…”

Section: Data Reuse Exploitation Strategymentioning

confidence: 99%

“…• the processing block is similar to the one presented in Figure 4 and uses accumulative multipliers (MACs) in double precision floating-point, according to IEEE-754 standard [17]. The Figure 6 shows a block diagram of the developed architecture.…”

Section: Case Study -Processing Architecturementioning

confidence: 99%

Architecture for dense matrix multiplication on a high-performance reconfigurable system

Souza

Medeiros

Lima

2009

Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes

View full text Add to dashboard Cite

The recent evolution of the programmable logic devices, such as FPGAs (Field Programmable Gate Array), associated with the growing demand for performance improvements in scientific computing applications, has attracted the attention of supercomputers vendors. They have been developing hybrid platforms that links general-purpose processors with coprocessors based on FPGAs, aiming computing acceleration.In this work we present the analysis and development of an important scientific computing operation: matrix multiplication, targeting the commercial hybrid platform RASC (Reconfigurable Application-Specific Computing), developed by Silicon Graphics.The proposed architecture aims to reach better performance than conventional architectures, dissipating less power. To achieve this goal, we investigated the possibilities of implementation in parallel and data reuse intrinsic to the algorithm. Based on this investigation we propose a case study that uses the available resources in the target platform to explore these features.

show abstract

“…• O bloco de processamento é similar ao apresentado na Figura 5 e utiliza multiplicadores acumuladores (MACs) de ponto-flutuante precisão dupla, de acordo com o padrão IEEE-754 [18].…”

Section: Estudo De Caso -Arquitetura De Processamentounclassified

“…é o número de palavras que podem ser armazenadas nas BRAMs; A segunda consideração é o compromisso que deve existir entre o número de MACs (que é limitada pela quantidade de blocos de DSP)[18] e a largura de banda disponível na arquiteturaonde bw é a largura de banda da memória em bits por segundo, k é o número de MACs, , f é a freqüênciade operação do FPGA e DSP MAC N _ é o número máximo de MACs que podem ser instanciados no FPGA usando os DSPs disponíveis.…”

unclassified

Uma abordagem de alto desempenho para multiplicação de matrizes densas em sistemas reconfiguráveis

Souza¹,

Medeiros²,

Lima³

et al. 2009

Anais Do X Simpósio Em Sistemas Computacionais De Alto Desempenho (SSCAD 2009)

View full text Add to dashboard Cite

A demanda por máquinas de alto desempenho e por novas estratégias que buscam melhorar o processamento de dados em aplicações de computação científica tem crescido muito nos últimos anos. Algumas novas arquiteturas baseadas em GPU, processadores Cell e FPGA ou ainda plataformas híbridas aparecem como soluções para esses problemas. Neste trabalho nós apresentamos uma arquitetura de alto desempenho para implementação de multiplicação de matrizes densas em uma plataforma comercial híbrida, o RASC (Reconfigurable Application-Specific Computing). O RASC foi desenvolvido pela Silicon Graphics e consiste em uma plataforma composta por um processador de propósito geral acoplado a co-processadores baseados em FPGA. A arquitetura proposta investiga como a solução do problema de multiplicação de matrizes pode tirar proveito das características de uma plataforma com alto grau de paralelismo. Nós também investigamos a escalabilidade do algoritmo e os mecanismos de reuso de dados. Baseado nessas investigações um estudo de caso é sugerido e discutido me detalhes.

show abstract

Implementation of a double-precision multiplier accumulator with exception treatment to a dense matrix multiplier module in FPGA

Cited by 3 publications

References 6 publications

An FPGA-Based Accelerator to Speed-Up Matrix Multiplication of Floating Point Operations

An FPGA-Based Accelerator to Speed-Up Matrix Multiplication of Floating Point Operations

Architecture for dense matrix multiplication on a high-performance reconfigurable system

Uma abordagem de alto desempenho para multiplicação de matrizes densas em sistemas reconfiguráveis

Contact Info

Product

Resources

About