The REGDIA regression diagnostics algorithm in S-Plus is introduced in order to examine the accuracy of pK(a) predictions made with four updated programs: PALLAS, MARVIN, ACD/pKa and SPARC. This report reviews the current status of computational tools for predicting the pK(a) values of organic drug-like compounds. Outlier predicted pK(a) values correspond to molecules that are poorly characterized by the pK(a) prediction program concerned. The statistical detection of outliers can fail because of masking and swamping effects. The Williams graph was selected to give the most reliable detection of outliers. Six statistical characteristics (F(exp), R(2), R(P)(2), MEP, AIC, and s(e) in pK(a) units) of the results obtained when four selected pK(a) prediction algorithms were applied to three datasets were examined. The highest values of F(exp), R(2), R(P)(2), the lowest values of MEP and s(e), and the most negative AIC were found using the ACD/pK (a) algorithm for pK(a) prediction, so this algorithm achieves the best predictive power and the most accurate results. The proposed accuracy test performed by the REGDIA program can also be applied to test the accuracy of other predicted values, such as log P, log D, aqueous solubility or certain physicochemical properties of drug molecules.
When drugs are poorly soluble then, instead of the potentiometric determination of dissociation constants, pH-spectrophotometric titration can be used along with nonlinear regression of the absorbance response surface data. Generally, regression models are extremely useful for extracting the essential features from a multiwavelength set of data. Regression diagnostics represent procedures for examining the regression triplet (data, model, method) in order to check (a) the data quality for a proposed model; (b) the model quality for a given set of data; and (c) that all of the assumptions used for least squares hold. In the interactive, PC-assisted diagnosis of data, models and estimation methods, the examination of data quality involves the detection of influential points, outliers and high leverages, that cause many problems when regression fitting the absorbance response hyperplane. All graphically oriented techniques are suitable for the rapid estimation of influential points. The reliability of the dissociation constants for the acid drug silybin may be proven with goodness-of-fit tests of the multiwavelength spectrophotometric pH-titration data. The uncertainty in the measurement of the pK (a) of a weak acid obtained by the least squares nonlinear regression analysis of absorption spectra is calculated. The procedure takes into account the drift in pH measurement, the drift in spectral measurement, and all of the drifts in analytical operations, as well as the relative importance of each source of uncertainty. The most important source of uncertainty in the experimental set-up for the example is the uncertainty in the pH measurement. The influences of various sources of uncertainty on the accuracy and precision are discussed using the example of the mixed dissociation constants of silybin, obtained using the SQUAD(84) and SPECFIT/32 regression programs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.