We have developed TASSER, a hierarchical approach to protein structure prediction that consists of template identification by threading, followed by tertiary structure assembly via the rearrangement of continuous template fragments guided by an optimized C␣ and side-chain-based potential driven by threadingbased, predicted tertiary restraints. TASSER was applied to a comprehensive benchmark set of 1,489 medium-sized proteins in the Protein Data Bank. With homologues excluded, in 927 cases, the templates identified by our threading algorithm PROSPECTOR 3 have a rms deviation from native <6.5 Å with Ϸ80% alignment coverage. After template reassembly, this number increases to 1,172. This shows significant and systematic improvement of the final models with respect to the initial template alignments. Furthermore, significant improvements in loop modeling are demonstrated. We then apply TASSER to the 1,360 medium-sized ORFs in the Escherichia coli genome; Ϸ920 can be predicted with high accuracy based on confidence criteria established in the Protein Data Bank benchmark. These results from our unprecedented comprehensive folding benchmark on all protein categories provide a reliable basis for the application of TASSER to structural genomics, especially to proteins of low sequence identity to solved protein structures. D espite considerable effort, the prediction of the native structure of a protein from its amino acid sequence remains an outstanding unsolved problem. In this postgenomic era, because protein structure can assist in functional annotation, the need for progress is even more crucial (1, 2). Historically, protein structure prediction divides into three categories: comparative modeling (CM) (3, 4), threading (5, 6), and new fold prediction (7-9). In CM, the protein structure is predicted by aligning the target sequence to an evolutionarily related, solved template structure. Threading goes beyond CM in that it is designed to match sequences to proteins adopting similar folds, where the target and template sequences need not be evolutionarily related. Finally, for new folds, the target sequence could adopt a structure not seen before and modeling should be done ab initio. This is the hardest category with the lowest prediction accuracy.As the most robust of the protein structure prediction approaches, there are three main issues involved in CM͞threading methods. First, a necessary precondition for their success is the completeness of the library of solved structures in the Protein Data Bank (PDB) (10). Recently, it was demonstrated that the PDB library is most likely complete for single domain protein structures at low to moderate resolution (11); e.g., for any given protein up to 100 residues, regardless of whether it is evolutionarily related to other solved protein structures, there is at least one already solved structure existing in the PDB that has a rms deviation (rmsd) from native Ͻ4 Å for 90% of its residues. This strongly suggests that the protein structure prediction problem can in principle be s...