Recent years have seen a substantial growth in the adoption of machine learning approaches for the purposes of quantitative structure-activity relationship (QSAR) development. Such a trend has coincided with desire to see a shifting in the focus of methodology employed within chemical safety assessment: away from traditional reliance upon animal-intensive in vivo protocols, and towards increased application of in silico (or computational) predictive toxicology. With QSAR central amongst techniques applied in this area, the emergence of algorithms trained through machine learning with the objective of toxicity estimation has, quite naturally, arisen. On account of the pattern-recognition capabilities of the underlying methods, the statistical power of the ensuing models is potentially considerable–appropriate for the handling even of vast, heterogeneous datasets. However, such potency comes at a price: this manifesting as the general practical deficits observed with respect to the reproducibility, interpretability and generalisability of the resulting tools. Unsurprisingly, these elements have served to hinder broader uptake (most notably within a regulatory setting). Areas of uncertainty liable to accompany (and hence detract from applicability of) toxicological QSAR have previously been highlighted, accompanied by the forwarding of suggestions for “best practice” aimed at mitigation of their influence. However, the scope of such exercises has remained limited to “classical” QSAR–that conducted through use of linear regression and related techniques, with the adoption of comparatively few features or descriptors. Accordingly, the intention of this study has been to extend the remit of best practice guidance, so as to address concerns specific to employment of machine learning within the field. In doing so, the impact of strategies aimed at enhancing the transparency (feature importance, feature reduction), generalisability (cross-validation) and predictive power (hyperparameter optimisation) of algorithms, trained upon real toxicity data through six common learning approaches, is evaluated.
This paper is dedicated to Prof. Paola Gramatica on the occasion of her retirement.Abstract: Recent years have seen the emergence into circulation of a growing array of novel psychoactive substances (NPS). Knowledge of the pharmacological profiles and risk liability of these compounds is typically very scarce. Development of chemoinformatic tools enabling prediction of properties within uncharacterised analogues has potential be of particular use. In order to facilitate this, compilation of a chemical inventory comprising known NPS is a necessity. Sourcing a variety of published governmental and analytical reports, a dataset composed of 690 distinct acknowledged NPS, complete with defined chemical structures, has been constructed. This is supplemented by a complementary series of 155 established psychoactive drugs of abuse (EPDA). Classification was performed in accordance with their key molecular structural features, subjective effect profiles and pharmacological mechanisms of action. In excess of forty chemical groupings, spanning seven subjective effect categories and six broad mechanisms of pharmacological action, were identified. Co-occurrence of NPS and EPDA within specific classes was common, showcasing inherent scope both for chemical read-across and for the derivation of structural alerts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.