This article presents a latency-optimized implementation of the SIMECK lightweight block cipher on a field-programmable-gate-array (FPGA) platform with a block and key lengths of 32 and 64 bits. The critical features of our architecture include parallelism, pipelining, and a dedicated controller. Parallelism splits the digits of the key and data blocks into smaller segments. Then, we use each segmented key and data block in parallel for encryption and decryption computations. Splitting key and data blocks helps reduce the required clock cycles. A two-stage pipelining is used to shorten the critical path and to improve the clock frequency. A dedicated controller is implemented to provide control functionalities. For the performance evaluation of our design, we report implementation results for two different cases on Xilinx 7-series FPGA devices. For our case one, the proposed architecture can operate on 382, 379, and 388 MHz frequencies for Kintex-7, Virtex-7, and Artix-7 devices. On the same Kintex-7, Virtex-7, and Artix-7 devices, the utilized Slices are 49, 51, and 50. For one encryption and decryption computation, our design takes 16 clock cycles. The minimum power consumption is 172 mW on the Kintex-7 device. For the second case, we targeted the same circuit frequency of 50 MHz for synthesis on Kintex-7, Virtex-7, and Artix-7 devices. With minimum hardware resource utilization (51 Slices), the least consumed power of 13.203 mW is obtained for the Kintex-7 device. For proof-of-concept, the proposed SIMECK design is validated on the NEXYS 4 FPGA with the Artix-7 device. Consequently, the implementation results reveal that the proposed architecture is suitable for many resource-constrained cryptographic applications.