There is a large body of evidence that many protein−ligand cocrystal structures contain poorly refined ligand geometries. These errors result in bound structures that have nonideal bond lengths and angles, are strained, contain improbable conformations, and have bad protein−ligand contacts. Many of these problems can be greatly reduced with better refinement models. KEYWORDS: Protein−ligand cocrystal, structure-induced fit, structure-based design, structure refinement, bound ligand strain T he ability to accurately determine the 3D structure of protein−ligand complexes using X-ray crystallography has provided an important tool for drug discovery. The number of publicly available structures in the RCSB PDB (www.pdb.org) has grown to almost 100,000, and of course this does not include the many thousands of proprietary structures that have been determined. During this growth there have been numerous studies that raise concerns about the fidelity of many of these structures with regard to the bound ligand. 1−4 They all find that many bound ligands in the PDB incorporate a surprising amount of internal strain. Further, inspection of these structures shows a litany of distorted rings, bad contacts, and unusual conformations/configurations. In short, the prevailing literature suggests that current refinement procedures often do a poor job of correctly refining the bound ligand.
■ EVIDENCE OF A PROBLEMJust to give a few examples: 1xqd contains three planar oxygens as part of a phosphate group; 1pme features a planar sulfur in the sulfoxide; 1tnk, a 1.8 Å resolution structure, contains a nonplanar tetrahedral aromatic carbon as part of a substituted aniline; and 4g93 contains an olefin that is twisted nearly 90°o ut of the plane. While it is surprising that such egregious chemical structures could find their way into the literature much less the PDB, we might dismiss them as anomalies. However, the truth is any systematic evaluation of the bound ligands in the PDB will uncover countless, less dramatic, albeit still serious structural errors. While the tone of some early work seems to be more in the direction of attempting to explain this phenomenon, 1 there has gradually been a widespread realization that induced fit cannot explain the large number of strained and distorted ligands.Studies of bound ligand strain commonly entail assembling a selection of cocrystal structures from the PDB, extracting the bound ligand, and then optimizing the ligand outside the confines of the protein active site. While this sounds simple, there are many important computational details that can have a significant effect on the results. For example, selection of the bound and free reference states, inclusion of solvation (or other medium effects), and the model employed (e.g., force field or quantum). The most common measures of bound ligand strain are usually referred to as local or global strain as defined in eqs 1 and 2, respectively.The first definition (eq 1) is problematic in that it demands extensive conformational analysis ...