Determining the structure and function of a novel protein is a cornerstone of many aspects of modern biology. Over the past decades, a number of computational tools for structure prediction have been developed. It is critical that the biological community is aware of such tools and is able to interpret their results in an informed way. This protocol provides a guide to interpreting the output of structure prediction servers in general and one such tool in particular, the protein homology/analogy recognition engine (Phyre). New profile-profile matching algorithms have improved structure prediction considerably in recent years. Although the performance of Phyre is typical of many structure prediction systems using such algorithms, all these systems can reliably detect up to twice as many remote homologies as standard sequence-profile searching. Phyre is widely used by the biological community, with 4150 submissions per day, and provides a simple interface to results. Phyre takes 30 min to predict the structure of a 250-residue protein.
INTRODUCTIONAt present, over six million unique protein sequences have been deposited in the public databases, and this number is growing rapidly (http://www.ncbi.nlm.nih.gov/RefSeq/). Meanwhile, despite the progress of high-throughput structural genomics initiatives, just over 50,000 protein structures have so far been experimentally determined. This enormous disparity between the number of sequences and structures has driven research toward computational methods for predicting protein structure from sequence. Computational methods grounded in simulation of the folding process using only the sequence itself as input (the so-called ab initio or de novo approaches) have been pursued for decades and are showing some progress 1 . However, in general, these methods are either computationally intractable or show poor performance on everything except the smallest proteins (o100 amino acids) 1 .The most successful general approach for predicting the structure of proteins involves the detection of homologs of known three-dimensional (3D) structure-the so-called template-based homology modeling or fold-recognition. These methods rely on the observation that the number of folds in nature appears to be limited and that many different remotely homologous protein sequences adopt remarkably similar structures 2 . Thus, given a protein sequence of interest, one may compare this sequence with the sequences of proteins with experimentally determined structures. If a homolog can be found, an alignment of the two sequences can be generated and used directly to build a 3D model of the sequence of interest. The practical applications of protein structure prediction are many and varied, including guiding the development of functional hypotheses about hypothetical proteins 3 , improving phasing signals in crystallography 4 , selecting sites for mutagenesis 5 and the rational design of drugs 6 .Every 2 years an international blind trial of protein structure prediction techniques is held (Critical Assessmen...