This paper introduces a new versatile and highperformance parallel hardware engine for matrix computations. The proposed architecture reduces memory bandwidth by taking advantage of data redundancies and employing distributed memory structures. It is designed to better utilize the on chip area for computing different types of matrix computations such as matrix power, multiplication, and inversion. The matrix power presented in this paper is proven to be two times faster than normal computations. As well, the architecture is optimized to suitably perform Least Square computations in signal processing applications. The synthesis results on FPGA platforms indicate that the proposed architecture can operate in 75 MHz for 16 bit word length and the peak attained performance is about 2400 MMAC operations with 32 concurrent MAC modules.
A pre-computation based technique to lower the power consumption of sequential multipliers is presented. This technique also speeds up the multiplication by reducing the number of clock ticks required to complete a multiplication. The proposed technique may be applied to different sequential multiplication schemes. The benchmark data is extracted from typical DSP applications to show the efficiency of the proposed technique in the domain of DSP computations in which the low power computing is of rapidly increasing importance. The results show an average of 25% reduction in the switching activity and 30% reduction in the clock tick count, compared to sequential multipliers without this technique.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.