Abstract-Energy-efficient implementation of high-speed softoutput trellis decoders is of great practical importance. This paper first presents an algorithm-level technique, referred to as quasi-reduced-state trellis decoding, that enables the use of reduced-state trellis decoding concept to reduce the energy consumption of decoding data storage without incurring any speed penalty. Then we propose to integrate this algorithmlevel technique with an importance-aware clock skew scheduling approach that enables the use of aggressive voltage overscaling in decoding computation datapath at the cost of small decoding performance degradation. The integration of these two techniques can provide a wide and flexible design space to explore the decoding performance vs. decoding energy consumption trade-off for very high-speed soft-output trellis decoder implementations. The effectiveness has been demonstrated through 1Gbps softoutput Viterbi algorithm (SOVA) decoder ASIC design at 65nm technology node.