Matrix operations, like matrix multiplication, are commonly used in almost all areas of scientific research. Matrix multiplication has significant application in the areas of graph theory, numerical algorithms, signal processing, and digital control. Matrix multiplication is a computationally intensive problem, especially the design and efficient implementation on an FPGA where resources are very limited, has been more demanding. FPGA based designs are usually evaluated using three performance metrics: speed (latency), area, and power (energy). Fixed point implementations in FPGA are fast and have minimal power consumption. With today's applications requiring ever higher computational throughputs, distributed memory approach is an effective solution for real-time applications. This application shows how to achieve higher computational throughput via parallel processing with the DSP processors. The matrix-vector multiplication applied to calculate linear convolution. This paper presents an FPGAbased hardware realization of matrix multiplication based on distributed memory approach architecture. We propose an architecture that is capable of handling matrices of variable sizes our designs minimize the gate count, area, improvements in latency, computational time, and throughput for performing matrix multiplication and reduces the number of multiplication and additions hardware required to get the matrices multiplied on commercially available FPGA devices.