Wider acceptance of QSARs would result in a constellation of benefits and savings to both private and public sectors. For this to occur, particularly in regulatory applications, a model's limitations need to be identified. We define a model's limitations as encompassing assessment of overall prediction accuracy, applicability domain and chance correlation. A general guideline is presented in this review for assessing a model's limitations with emphasis on and examples of application with consensus modeling methods. More specifically, we discuss the commonalities and differences between external validation and cross-validation for assessing a model's limitations. We illustrate two common ways of assessing overall prediction accuracy, depending on whether or not the intended application domain is predefined. Since even a high quality model will have different confidence in accuracy for predicting different chemicals, we further demonstrate using the novel Decision Forest consensus modeling method a means to determine prediction confidence (i.e., certainty for an individual chemical's prediction) and domain extrapolation (i.e., the prediction accuracy for a chemical that is outside the chemistry space defined by the training chemicals). We show that prediction confidence and domain extrapolation are related measures that together determine the applicability domain of a model, and that prediction confidence is the more important measure. Lastly, the importance of assessing chance correlation is emphasized, and illustrated with several examples of models having a high degree of chance correlations despite cross-validation indicating high prediction accuracy. Generally, a dataset with a skewed distribution, small data size and/or low signal/noise ratio tends to produce a model with high chance correlation.We conclude that it is imperative to assess all three aspects (i.e., overall accuracy, applicability domain and chance correlation) of a model for the regulatory acceptance of QSARs.
Some seven years have passed since the U.S. legislature mandated the EPA to develop and implement a screening and testing program for chemicals that may disrupt the delicate endocrine system. The envisioned EPA program has evolved to incorporate a tiered scheme of in vitro and in vivo assays, and considered QSAR as a viable method to set testing priorities. At the U.S. FDA's National Center for Toxicological Research (NCTR), the Endocrine Disruptor Knowledge Base Project has developed models to predict estrogen and androgen receptor binding. Our approach rationally integrates various QSAR models into a sequential "Four-Phase" scheme according to the strength of each type of model. In four hierarchical phases, models predict the inactive chemicals that are then eliminated from the pool of chemicals to which increasingly precise but more time-consuming models are subsequently applied. Each phase employs different models selected to work complementarily in representing key activity-determining structure features in order to absolutely minimize the rate of false negatives, an outcome we view as paramount for regulatory use. In this paper, the QSAR models developed at NCTR, and particularly how we integrated these models into the "Four-Phase" system will be discussed for a number of datasets, including 58 000 chemicals identified by the U.S. EPA.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.