Explaining the mismatch between predicted timing behavior from modeling and simulation, and the observed timing behavior measured on silicon chips can be very challenging. Given a list of potential sources, the mismatch can be the aggregate result caused by some of them both individually and collectively, resulting in a very large search space. Furthermore, observed data are always corrupted by some unknown statistical random noises. To overcome both challenges, this paper proposes a statistical diagnosis framework that formulates the diagnosis problem as a regression learning problem. In this diagnosis framework, the objective is to rank a set of features corresponding to the list of potential sources of concern. The rank is based on measured silicon path delay data such that a feature inducing a larger unexpected timing deviation is ranked higher. Experimental results are presented to explain the learning method. Diagnosis effectiveness will be demonstrated through benchmark experiments and on an industrial design.