Polar codes have been receiving increased attention for application in beyond 5G networks. They offer low-complexity decoding algorithm and can achieve symmetric channel capacity. However, the majority of research works have focused on the codes constructed by the binary kernel (2 × 2 polarization matrix) which bounds the code length to an integer power of 2. Multi-kernel polar codes have been proposed as a method that allows the construction of polar codes with sizes different from powers of 2 by mixing multiple kernels of different dimensions. A hardware implementation based on the successive cancellation (SC) algorithm found in the literature shows that it suffers from a long decoding latency. In this paper, we design and implement a multi-kernel decoder based on the fast-simplified SC (fast-SSC) algorithm to decrease the decoding latency. It can decode any code constructed by binary and ternary (3 × 3) kernels featuring flexible code length, code rate, and kernel sequence. FPGA implementation results reveal that a polar code of length N = 1536, rate R = 1/2 with Processing Element (P e ) value of P e = 240, gains 84.6% lower latency compared to the original algorithm. Also, the architecture supports polar codes constructed by purely-binary and purely-ternary kernels. A polar code of length N = 1024, rate R = 1/2, and P e = 120 achieves an information throughput of 432 Mbps.