The Z-score of a protein is defined as the energy separation between the native fold and the average of an ensemble of misfolds in the units of the standard deviation of the ensemble. The Z-score is often used as a way of testing the knowledge-based potentials for their ability to recognize the native fold from other alternatives. However, it is not known what range of values the Z-scores should have if one had a correct potential. Here, we offer an estimate of Z-scores extracted from calorimetric measurements of proteins. The energies obtained from these experimental data are compared with those from computer simulations of a lattice model protein. It is suggested that the Z-scores calculated from different knowledge-based potentials are generally too small in comparison with the experimental values.
Keywords: knowledge-based potentials; protein folding; Z-scoresIn protein folding studies, knowledge-based potentials derived from a statistical analysis of known protein structures (Ueda et al., 1978;Maiorov & Crippen, 1992;Kolinski & Skolnick, 1994;Sippl, 1995; Mimy & Domany, 1996;Miyazawa & Jernigan, 1996;Liwo et al., 1997;Park et al., 1997) are frequently used in simplified models of proteins. The quality of such potentials is often assessed by socalled Z-scores, which test how well the potentials differentiate the native fold of a protein from an ensemble of misfolded structures. These Z-scores are calculated by (Bowie et al., 1991;Sippl, 1993): where Enotive is the energy of the native structure of a protein, is the average energy of an ensemble of misfolded structures, and arn,,f;,/ds is the standard deviation of the energy in this ensemble.However, it is not known what range of values of Z-scores should be expected for real proteins with exact potentials. Table 1 lists the Z-score ranges calculated from several typical knowledgebased potentials reported in the literature. The Z-scores vary tremendously from protein to protein. The widest spread of Z-scores was found for hydrophobic fitness potentials (Huang et al., 1996), which give a range between 1 and 24 for similar sized proteins, although on average the Z-score is quite high, around 11. In fact, because many approximations are introduced in the derivation of the knowledge-based potentials, there is a great deal of uncertainty as to the validity of these potentials (Jones & Thornton, 1996; Reprint requests to: Jeffrey Skolnick, Department of Molecular Biology. TPC-5, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037; e-mail: skolnick@scripps.edu.Pereira & Pochapsky, 1996;Thomas & Dill, 1996;Skolnick et al., 1997;Wang & Ben-Naim, 1997). This conflicting situation calls for examination of the physical meaning of the Z-scores.In this paper, we suggest a means of extracting Z-scores from the experimental data of proteins. Such Z-scores are defined differently than Zmr,,f,,,d.y, but it is possible to relate the two quantities. Thus, it is hoped that the results presented here can provide guidance into reaso...