Compressive Sensing (CS) signal reconstruction can be implemented using convex relaxation, non-convex, or local optimization algorithms. Though the reconstruction using convex optimization, such as the Iterative Hard Thresholding algorithm, is more accurate than matching pursuit algorithms, most researchers focus on matching pursuit algorithms because they are less computationally complex. Orthogonal Matching Pursuit (OMP) is a greedy algorithm, which solves the problem by choosing the most significant variable to reduce the least square error. In this paper, we propose an efficient parallel architecture for OMP CS reconstruction. For architecture implementation, we perform measurement and sparsity analysis to reduce the complexity. The proposed architecture is platform independent and is implemented on 7 different platforms including general purpose CPUs, GPUs, a Virtex-7 FPGA and a domain specific many-core. The implementation results indicate that reconstruction time on FPGA is improved by 3× compared to previous FPGA implementation, whereas GPU implementation is 4× faster than the previously proposed GPU-based OMP architecture. The CPU implementation is 6× faster, compared with previous CPUbased implementation. The domain specific many-core acheives 24 times faster reconstruction time when compared to both GPU and CPU implementations.