In order to minimize expensive drug failures, is essential to determine potential activity, toxicity and ADME problems as early as possible. In view of the large libraries of compounds now being handled by combinatorial chemistry and high-throughput screening, identification of potential drug is advisable even before synthesis using computational techniques such as QSAR modeling. A great number of in silico approaches to activity/toxicity prediction have been described in the literature, using molecular 0D, 1D, 2D and 3D descriptors. Also these descriptors have been implemented in available computational tools such as DRAGON, SYBYL and CODESSA for it easy use. However, many of them only have been used to explain a few prediction problems. This review attempts to summarize present knowledge related to the computational biological activity prediction based in 2D molecular descriptors implemented in the DRAGON software. These applications rely on new computational techniques such as virtual combinatorial synthesis, virtual computational screening or inverse. Several topological molecular descriptors applications are described, ranging from simple topological indices to topological indices derived from matrices weighted with atomic and bond properties. Their advantages, limitations and its possibilities in drug design are also discussed.
Variable selection is a procedure used to select the most important features to obtain as much information as possible from a reduced amount of features. The selection stage is crucial. The subsequent design of a quantitative structure-activity relationship (QSAR) model (regression or discriminant) would lead to poor performance if little significant features are selected. In drug design modern era, by the means of combinatorial chemistry and high throughput screening, an unprecedented amount of experimental information has been generated. In addition, many molecular descriptors have been defined in the last two decays. All this information can be analyzed by QSAR techniques using adequate statistical procedures. These techniques and procedures should be fast, automated, and applicable to large data sets of structurally diverse compounds. For that reason, the identification of the best one seems to be a very difficult task in view of the large variable selection techniques existing nowadays. The intention of this review is to summarize some of the present knowledge concerning to variable selection methods applied to some well-known statistical techniques such as linear regression, PLS, kNN, Artificial Neural Networks, etc, with the aim to disseminate the advances of this important stage of the QSAR building model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.