The interest in the application of machine learning techniques (MLT) as drug design tools is growing in the last decades. The reason for this is related to the fact that the drug design is very complex and requires the use of hybrid techniques. A brief review of some MLT such as self-organizing maps, multilayer perceptron, bayesian neural networks, counter-propagation neural network and support vector machines is described in this paper. A comparison between the performance of the described methods and some classical statistical methods (such as partial least squares and multiple linear regression) shows that MLT have significant advantages. Nowadays, the number of studies in medicinal chemistry that employ these techniques has considerably increased, in particular the use of support vector machines. The state of the art and the future trends of MLT applications encompass the use of these techniques to construct more reliable QSAR models. The models obtained from MLT can be used in virtual screening studies as well as filters to develop/discovery new chemicals. An important challenge in the drug design field is the prediction of pharmacokinetic and toxicity properties, which can avoid failures in the clinical phases. Therefore, this review provides a critical point of view on the main MLT and shows their potential ability as a valuable tool in drug design.
ML techniques have been successfully employed in pharmacokinetic studies, helping the complex process of designing new drug candidates from the use of reliable ML models. An application of this procedure would be the prediction of ADME-Tox properties from studies of quantitative structure-activity relationships or the discovery of new compounds from a virtual screening using filters based on results obtained from ML techniques.
Semi-supervised learning is drawing increasing attention in the era of
big data, as the gap between the abundance of cheap, automatically collected
unlabeled data and the scarcity of labeled data that are laborious and expensive to
obtain is dramatically increasing. In this paper, we first introduce a unified view
of density-based clustering algorithms. We then build upon this view and bridge the
areas of semi-supervised clustering and classification under a common umbrella of
density-based techniques. We show that there are close relations between
density-based clustering algorithms and the graph-based approach for transductive
classification. These relations are then used as a basis for a new framework for
semi-supervised classification based on building-blocks from density-based
clustering. This framework is not only efficient and effective, but it is also
statistically sound. In addition, we generalize the core algorithm in our framework,
HDBSCAN*, so that it can also perform semi-supervised clustering by directly taking
advantage of any fraction of labeled data that may be available. Experimental
results on a large collection of datasets show the advantages of the proposed
approach both for semi-supervised classification as well as for semi-supervised
clustering.
Dipeptidyl peptidase-4 (DPP-4) is an important biological target related to the treatment of diabetes as DPP-4 inhibitors can lead to an increase in the insulin levels and a prolonged activity of glucagon-like peptide-1 (GLP-1) and gastric inhibitory polypeptide (GIP), being effective in glycemic control. Thus, this study analyses the main molecular interactions between DPP-4 and a series of bioactive ligands. The methodology used here employed molecular modeling methods, such as HQSAR (Hologram Quantitative Structure-Activity) analyses and molecular docking, with the aim of understanding the main structural features of the compound series that are essential for the biological activity. Analyses of the main interactions in the active site of DPP-4, in particular, the contribution of the hydroxyl coordination between Tyr547 and Ser630 by the water molecule, which is described in the literature as important for the coordinated interactions in the active site, were performed. Significant correlation coefficients of the best 2D model (r(2) = 0.942 and q(2) = 0.836) were obtained, indicating the predictive power of this model for untested compounds. Therefore, the final model constructed in this study, along with the information from the contribution maps, could be useful in the design of novel DPP-4 ligands with improved activity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.