Semi-supervised
learning has proved its efficacy in utilizing extensive
unlabeled data to alleviate the use of a large amount of supervised
data and improve model performance. Despite its tremendous potential,
semi-supervised learning has yet to be implemented in the field of
drug discovery. Empirical testing of drugs and their classification
is costly and time-consuming. In contrast, predicting therapeutic
applications of drugs from their structural formulas using semi-supervised
learning would reduce costs and time significantly. Herein, we employ
a new multicontrastive-based semi-supervised learning algorithmMultiConfor
classifying drugs into 12 categories, according to therapeutic applications,
on the basis of image analyses of their structural formulas. By rational
use of data balancing, online augmentations of the drug image data
during training, and the combined use of multicontrastive loss with
consistency regularization, MultiCon achieves better class prediction
accuracies when compared with the state-of-the-art machine learning
methods across a variety of existing semi-supervised learning benchmarks.
In particular, it performs exceptionally well with a limited number
of labeled examples. For instance, with just 5000 labeled drugs in
a PubChem (D3) data set, MultiCon achieved a class prediction
accuracy of 97.74%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.