The model of membrane computing, also known under the name of P systems, is a bio-inspired large-scale parallel computing paradigm having a good potential for the design of massively parallel algorithms. For its implementation it is very natural to choose hardware platforms that have important inherent parallelism, such as field-programmable gate arrays (FPGAs) or compute unified device architecture (CUDA)-enabled graphic processing units (GPUs). This article performs an overview of all existing approaches of hardware implementation in the area of P systems. The quantitative and qualitative attributes of FPGA-based implementations and CUDA-enabled GPU-based simulations are compared to evaluate the two methodologies.