The ever-increasing parametric variations in the latest nanometer technologies pose a severe reliability challenge for VLSI design. Specifically, technology scaling leads to an increasing performance variation even when the average performance improves. The traditional VLSI design methodology requires that all computation in a logic stage complete within one clock cycle. This has hindered further performance improvement. Alternatively, allowing computation in a logic stage to complete in a variable number of clock cycles leads to average performance improvement and enables further power reduction. In this paper, we present a generic variable-latency design methodology, which includes timing analysis, delay test input generation, design of a completion prediction unit for logic computation latency, and a clock gating scheme. Our experiments based on on the 45nm Nangate open cell library and the des MCNC benchmark circuit show that, for a clock gating occurrence probability 6.25%, our technique leads to maximum 8.29%, 9.96%, and 9.18% area reduction, and 27.54%, 28.08%, and 29.93% power reduction with a prediction unit of 4, 5, and 6 inputs, predicting the top 1, 2, and 4 timing-critical paths, respectively.