At present, the Viterbi algorithm (VA) is widely used in communication systems for decoding and equalization. The achievable speed of conventional Viterbi decoders (VD's) is limited by the inherent nonlinear add-compare-select (ACS) recursion. The aim of this paper is to describe system design and VLSI implementation of a complex system of fabricated ASIC's for high speed Viterbi decoding using the "minimized method" (MM) parallelized VA. We particularly emphasize the interaction between system design, architecture and VLSI implementation as well as system partitioning issues and the resulting requirements for the system design flow. Our design objectives were 1) to achieve the same decoding performance as a Conventional VD using the parallelized algorithm, 2) to achieve a speed of more than 1 Gb/s, and 3) to realize a system for this task using a single cascadable ASIC. With a minimum system configuration of four identical ASIC's produced by using 1.0 p CMOS technology, the design objective of a decoding speed of 1.2 Gb/s is achieved. This means, compared to previous implementations of Viterbi decoders, the speed is increased by an order of magnitude.