The XG-PON standard for Passive Optical Networks (PONs) requires the utilization of a Reed-Solomon block code at a 10Gbps downstream rate, dictating low latency and high throughput processing, no word interleaving and no stall between codewords. The current paper presents in detail a parallel architecture which decodes the RS(248,216) shortened code in the XG-PON ONT/ONU receiver. Based on a modified implementation of the Degree-Computationless Modified Euclidean (DCME) algorithm, the designed Key Equation Solver (KES) and its control unit allow for both solving the key equation and computing the number of the errors detected, in 31 clock cycles. Validating the proposed design on a Xilinx Kintex 7 FPGA and comparing to a pipelined serial DCME implementation reveals a reduction of 48% in the number of slices occupied and 6 times regarding the latency induced. Our implementation achieves a throughput of 16Gbps on the specified device thus meeting the XG-PON downstream FEC requirements with relatively low effort. The results could be adapted for a multitude of optical communication standards based on RS codes due to the 64-bit pipelined architecture and the FPGA-transparent HDL design