Purpose of review
To provide an accessible overview of some of the most recent trends in the application of machine learning to the field of substance use disorders and their implications for future research and practice.
Recent findings
Machine-learning (ML) techniques have recently been applied to substance use disorder (SUD) data for multiple predictive applications including detecting current abuse, assessing future risk and predicting treatment success. These models cover a wide range of machine-learning techniques and data types including physiological measures, longitudinal surveys, treatment outcomes, national surveys, medical records and social media.
Summary
The application of machine-learning models to substance use disorder data shows significant promise, with some use cases and data types showing high predictive accuracy, particularly for models of physiological and behavioral measures for predicting current substance use, portending potential clinical diagnostic applications; however, these results are uneven, with some models performing poorly or at chance, a limitation likely reflecting insufficient data and/or weak validation methods. The field will likely benefit from larger and more multimodal datasets, greater standardization of data recording and rigorous testing protocols as well as greater use of modern deep neural network models applied to multimodal unstructured datasets.
Cheminformatics aims to assist in chemistry applications that depend on molecular interactions, structural characteristics, and functional properties. The arrival of deep learning and the abundance of easily accessible chemical data from repositories like PubChem have enabled advancements in computer-aided drug discovery. Virtual High-Throughput Screening (vHTS) is one such technique that integrates chemical domain knowledge to perform in silico biomolecular simulations, but prediction of binding affinity is restricted due to limited availability of ground-truth binding assay results. Here, text representations of 83,000,000 molecules are leveraged to enable single-target binding affinity prediction directly on the outcome of screening assays. The embedding of an end-to-end Transformer neural network, trained to encode the structural characteristics of a molecule via a text-based translation task, is repurposed through transfer learning to classify binding affinity to a single target. Classifiers trained on the embedding outperform those trained on SMILES strings for multiple tasks, receiving between 0.67-0.99 AUC. Visualization reveals organization of structural and functional properties in the learned embedding useful for binding prediction. The proposed model is suitable for parallel computing, enabling rapid screening as a complement to virtual screening techniques when limited data is available.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.