This paper presents a novel parallel breadth-first detection scheme for Multiple-Input Multiple-Output (MIMO) systems called Partial Inter-layer Parallel Sphere Decoder (PIPS-D). By introducing a new form of complex-to-real lattice transformation, the proposed algorithm can simultaneously compute every two adjacent layers of the equivalent search tree, thus significantly reducing the total execution time. Besides, local sorting and fast combination are applied to further speed up and simplify the decoding process. Testing results on Xilinx Virtex 6 FPGA show that at the cost of less than 2dB BER performance loss, the proposed algorithm can achieve 356 Mbps throughput for 4×4 16QAM MIMO systems, which is far beyond traditional detection schemes such as SD and K-best.Index Terms-MIMO, sphere decoder, parallel, FPGA