The crystallographic reliability index R complete is based on a method proposed more than two decades ago. Because its calculation is computationally expensive its use did not spread into the crystallographic community in favor of the cross-validation method known as R free . The importance of R free has grown beyond a pure validation tool. However, its application requires a sufficiently large dataset. In this work we assess the reliability of R complete and we compare it with k-fold cross-validation, bootstrapping, and jackknifing. As opposed to proper cross-validation as realized with R free , R complete relies on a method of reducing bias from the structural model. We compare two different methods reducing model bias and question the widely spread notion that random parameter shifts are required for this purpose. We show that R complete has as little statistical bias as R free with the benefit of a much smaller variance. Because the calculation of R complete is based on the entire dataset instead of a small subset, it allows the estimation of maximum likelihood parameters even for small datasets. R complete enables maximum likelihood-based refinement to be extended to virtually all areas of crystallographic structure determination including high-pressure studies, neutron diffraction studies, and datasets from free electron lasers. structure determination | reliability index | maximum likelihood refinement | overfitting | model bias T he quality of crystallographic models is described by several quality indicators. Both for small and macromolecular structure deposition, the crystallographic reliability index R1 must be provided (1, 2). It is calculated for the dataset H of observations and a structural model asDepending on the data-to-parameter ratio, R1 is affected by more or less severe overfitting (3, 4). To overcome this problem, cross-validation was introduced into crystallography (5-9). For cross-validation in crystallography, a certain fraction of the observations, typically 5-10%, are withheld as test set T and never used for model building and refinement. They are only used to calculate the reliability index R free :R free is much less affected by overfitting and since its introduction it has gained importance beyond validation of the structural model. It is used to optimize weights for restrained refinement (4, 10-13). The concept of R free paved the way for maximum likelihood methods in crystallography. It was shown that the estimation of maximum likelihood parameters based on the test set T provides much better accuracy than that based on the data used during refinement (14-16).Cross-validation reduces the bias of a statistic (17, 18) but can show large variance, especially when T is small (8, 17). The relative error of the crystallographic R free was established as σðR free Þ = R free = ffiffiffiffiffiffiffiffi ffi 2jTj p (19). The test set should hold at least 500 data points so that σðR free Þ=R free ≤ 0.032. Maximum likelihood methods estimate parameters in resolution bins, and a total of jTj ...
Electron diffraction enables structure determination of organic small molecules using crystals that are too small for conventional X-ray crystallography. However, because of uncertainties in the experimental parameters, notably the detector distance, the unit-cell parameters and the geometry of the structural models are typically less accurate and precise compared with results obtained by X-ray diffraction. Here, an iterative procedure to optimize the unit-cell parameters obtained from electron diffraction using idealized restraints is proposed. The cell optimization routine has been implemented as part of the structure refinement, and a gradual improvement in lattice parameters and data quality is demonstrated. It is shown that cell optimization, optionally combined with geometrical corrections for any apparent detector distortions, benefits refinement of electron diffraction data in small-molecule crystallography and leads to more accurate structural models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.