The acid dissociation constant (pK
a) has an important influence on molecular properties
crucial to compound
development in synthesis, formulation, and optimization of absorption,
distribution, metabolism, and excretion properties. We will present
a method that combines quantum mechanical calculations, at a semi-empirical
level of theory, with machine learning to accurately predict pK
a for a diverse range of mono- and polyprotic
compounds. The resulting model has been tested on two external data
sets, one specifically used to test pK
a prediction methods (SAMPL6) and the second covering known drugs
containing basic functionalities. Both sets were predicted with excellent
accuracy (root-mean-square errors of 0.7–1.0 log units), comparable
to other methodologies using a much higher level of theory and computational
cost.
The discovery of carcinogenic nitrosamine impurities above the safe limits in pharmaceuticals has led to an urgent need to develop methods for extending structure−activity relationship (SAR) analyses from relatively limited datasets, while the level of confidence required in that SAR indicates that there is significant value in investigating the effect of individual substructural features in a statistically robust manner. This is a challenging exercise to perform on a small dataset, since in practice, compounds contain a mixture of different features, which may confound both expert SAR and statistical quantitative structure− activity relationship (QSAR) methods. Isolating the effects of a single structural feature is made difficult due to the confounding effects of other functionality as well as issues relating to determining statistical significance in cases of concurrent statistical tests of a large number of potential variables with a small dataset; a nai ̈ve QSAR model does not predict any features to be significant after correction for multiple testing. We propose a variation on Bayesian multiple linear regression to estimate the effects of each feature simultaneously yet independently, taking into account the combinations of features present in the dataset and reducing the impact of multiple testing, showing that some features have a statistically significant impact. This method can be used to provide statistically robust validation of expert SAR approaches to the differences in potency between different structural groupings of nitrosamines. Structural features that lead to the highest and lowest carcinogenic potency can be isolated using this method, and novel nitrosamine compounds can be assigned into potency categories with high accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.