For single-domain proteins, we examine the completeness of the structures in the current Protein Data Bank (PDB) library for use in full-length model construction of unknown sequences. To address this issue, we employ a comprehensive benchmark set of 1,489 medium-size proteins that cover the PDB at the level of 35% sequence identity and identify templates by structure alignment. With homologous proteins excluded, we can always find similar folds to native with an average rms deviation (RMSD) from native of 2.5 Å with Ϸ82% alignment coverage. These template structures often contain a significant number of insertions͞deletions. The TASSER algorithm was applied to build full-length models, where continuous fragments are excised from the top-scoring templates and reassembled under the guide of an optimized force field, which includes consensus restraints taken from the templates and knowledge-based statistical potentials. For almost all targets (except for 2͞1,489), the resultant full-length models have an RMSD to native below 6 Å (97% of them below 4 Å). On average, the RMSD of full-length models is 2.25 Å, with aligned regions improved from 2.5 Å to 1.88 Å, comparable with the accuracy of low-resolution experimental structures. Furthermore, starting from state-of-theart structural alignments, we demonstrate a methodology that can consistently bring template-based alignments closer to native. These results are highly suggestive that the protein-folding problem can in principle be solved based on the current PDB library by developing efficient fold recognition algorithms that can recover such initial alignments. A s of December 30, 2003, Ͼ23,000 solved protein structures have been deposited in the Brookhaven Protein Data Bank (PDB) (1). This number keeps increasing, with Ϸ300 new entries added each month. The size and completeness of the PDB is essential to the success of template-based approaches to protein structure prediction. These methods include comparative modeling (2, 3) and threading (4-7), which are designed to infer an unknown sequence's structure based on solved, similarly folded protein structures in the PDB. Because an accurate theory for the prediction of protein structure on the basis of physical principles does not yet exist, comparative modeling͞threading approaches are the only reliable strategy for high-resolution tertiary structure prediction (8-10). On the other hand, the percentage of new folds in these new entries, the topology of which has never been seen in the current PDB library, keeps decreasing (e.g., the percentage of new folds was 27% in 1995 but 5% in 2001). The apparent saturation of new folds immediately raises an important question: At least for singledomain proteins, is the current structure library already complete enough to in principle solve the protein tertiary structure prediction problem at low-to-moderate resolutions?By means of a variety of structure comparison tools (11-14), this issue has been partially addressed by many authors (15)(16)(17)(18)(19)(20). It was demonstr...